DOCDHEITT RESaHE - 



ED 120 732 



CS 202 585 



aOTHOR 
TITIE 

INSTITUTION 

POB DATS 
NOTE 

AVillABLB FROM 



EDSS PRICE 
DESCRIPTORS 



Gronmon, Alfred H., Ed* ; And Others 

Reviews of Selected Published Tests in E;iglish* 

National Council of Teachers of English^ Urbana, 

111. 

76 

165p, 

National Council of Teachers of English, 1111 Ksnyon 
Road, Urbana, I115,nois 61801 (Stock No. *»1218, $*»,25 
nonneniber, $3.90 nember) 

HF-$0,83 HC-I8.69 Plus Post^^ge 
Composition (Literary) ; Eleaentary Secondary 
Education ; ♦English Instruction j language 
D^velopient; Literature; ♦Standardized Tests; 
Testing; Test Reliability; ♦Test Reviews; Test 
Selection; Test Validity 



ABSTRACT 

This publication i^ an outgrowth of the concern 
expressed by the National Council of Teachers of English regarding 
the nature^ uses, and nisuses of standardized tests, specifically the 
validity of the sub j^ct-^matter content of standardized English tests 
and the relation of the overall purposes of a school's total English 
program to standardized test results covering only a saall segaent of 
the subject. Part 1 of the book offers a context for current teaching 
that is sonevhat larger than that represented by an individual 
teacher's experiences vith standardized tests or by a review of a 
particular test. Part 2 contains 58 reviews representing evaluations 
of 51 different tests (excluding tests of reading skills). Part 3 
summarizes some of the problems in educational accounting and 
suggests specific things that teachers can do* Contributors are: 
Alfred H. Grommon, Halter Loban, Halter 'enkins, J. N. Hook, 
Richard Braddock^ and Ulan C. Purves. (J(t) 



♦ Documents acquired by ERIC include many informal unpublished ♦ 

♦ materials not available from other sources. EEIC makes every effort ♦ 

♦ to obtain the best copy available. Nevertheless, items of marginal ♦ 

♦ reproducibility are often encountered and this affects the quality ♦ 

♦ of the microfiche and hardcopy reproductions ERIC makes available ♦ 

♦ via the ERIC Document Reproduction Service (EDRS) . EDRS is not ♦ 

♦ responsible for the quality of the original document* Reproductions ♦ 

♦ supplied by EDRS are the best that can be made from the original. ♦ 

3|| 3|| 3|| 3|| 3|| 3|| 3|| 3|| 3|| 3|| 3|| 3|| 3|| % 3|| 3|| 3|| 3|| 3|| 3|| 3|| 3|| 3|| 3|| 3|| 3|| 3|| 3|| 3|| 



ERIC 



1 1 




Reviews of Selected Published Tests in English 



Cd^CATiON A WGLFAXG 
NATIONAL iNSTtT^re OF 
EOvCATlON 

tHiS OOCywCNt HAS BEEN RCPftO* 

ouCEo exactlv as rECEivEU padm 

(HE PERSON Qfi O^lOANlZAtlONOAIOlN- 

AtiNO If pC«niS or v^£y^' Oft Opinions 

StATED 00 not N£C£SSARI[.y rEpRE* 
S£NtOPF^lClAL NATIONAL INSHt^tEOP 
eoyCAtlOW POSITION Oft POLICV 



and Richard Draddock, University of Iowa 

J.iN. Hook, University of Illinois 

William A, Jenkins, Florida Intornational University 
Walter Loban, university of California, Berkeley 

Alan Purvcs, University ojf Illinois 



!s 



National Council of Tcachcr5 of English 
1111 Kcnyon Road, Urbana, Illinois61801 



PlRMl^SlON TO RCPR00t>CE ThlS COPV- 

National Council oX 
Teachers of English 

OUCTiOt* OyiSlOe THE 6fttC SVS1€m R6- 
OUlftli^ PERMISSION Of TmE COPtIuGMI 

OWMEn 



Prepared by the NCTE Committee to Review Standardized Tests, 
Alfred K. Grommon^ Chairman. 

NCTE EDITORIAL BOARD Charles E. Cooper. Evelyn Copeland, 
Bemice E. CulUnan* Richard Lloyd Jones* Frank Zidonis* 
Robert F. Hogan* ex officio, Paul O'Dea, ^jc officio. 

BOOK DESIGN Bob Btngenheimer 

ISBN0-SI41*412I*6 
NCrE Stock Number4I2l« 

Copyright © 1976 by the National Council ofTeachers of English. 
All rights reserved. Printed in the United States of America. 



Library of Congress Ca*al<^ Card Ni- 'ber 75^26123 



Contents 



Introduction 1 
Part One 

Statewide Accountability Programs 

ofTestingand Assessment 13 
Alfred Grommon 

Pari Two 

Language Development and Its Evaluation 45 
Walter Lobiin 

Elementary School Language Tests 52 
Waber A' Jenkins 

Tests on the English Language 76 
y. Hook 

Evaluation of Writing Tests 118 
Richard Braddoek 

Literature Tests 127 
AlanC.Pttn^es 

Part Three 

Problems and Reeonimendations 141 
Alfred tL Grommon 

Afterword 166 

Index of Tests Reviewed * 169 



4 



Introductton 



Alfred Gromnton 



Ah a comequencc of their involvement in systems of educational ac- 
countability throughout the nation, English teachers are perforce 
encountering uses of standardized tests of some kind. Considerable 
evidence indicates widespread interest, but mostly genuine eoneern, 
among teachers of English about uses of standardized tests related 
to aspeets of English programs in sehools. Aeeording to Leon Les- 
singer, over 4,000 books and articles on aeeountability vvere pub- 
lished between 1970 and 1974 (''Holding the Aeeountability Move- 
ment Accountable," Pki Delta Kappan 55 [June 1974], p, 657), Most 
plans of statewide testing and assessment no\^ functioning include 
some form of standardized testing of aspects uf English grammar, 
sentence structure, usage, vocabulary, effectiveness of vvritten ex- 
pression, punctuation, capitalization, spelling, and literature. The 
majority also include tests of reading skills. 

Questions about the purposes of testing programs and the selec- 
tion of tests continue to arise among teachers using standardized 
tests presumably designed to yield meaningful information about 
pupils' knowledge of and skills in using the English language and in 
reading literature. Just what are the purposes of these extensive test- 
ing programs? Who selects the tests? What criteria are applied in 
choosing tests? How valid is the content of each of these instru- 
nients? How representative of outcomes of the entire program of 
English are the results of .tests focused upon limited segments of 
such a complex subject? How are results to be used in accounting 
to the public and to educational agencies and authorities and in 
making state and local decisions about educational policie:,? What 
effects may test results have upon the status of individual teachers 
and upon states' allocations of funds to public schools in general 
and to a school district or school in particular? 

1 



5 



2 ALFRED H.GROMMON 



Because of these questions and other related problems, the Na- 
tional Council of Teachers of English has for some time been con- 
cerned wiiU the nature, uses, and misuses of standardized tests. The 
Council has been interested especially in t\\o major questions: 

(1) What is the validity of the subject-matter content of standar- 
dised English tests; that is, what is the relation betH^een what 
is known now about the subject of English, and the teaching 
of it, and the eonceptioiis of it explicitly or implicitly under- 
lying the format of, and subject-matter items in, such tests? 

(2J How representative of the larger outcomes of the purposes 
and breadth of a school's total program of English are results 
of a standardized test designed to produce information about 
pupils' knowledge and skills related to only a small segment 
of the subject? 

To mforni the profession and the public about the nature and scope 
of English as a school subject and about the teaching of it, the Coun- 
cil — almost from its inception in 1911 — has been publishing articles, 
monographs, books, reports presenting results of inquiries into 
many aspects of the subject. 

Legislators, state and local school administrators, parents, and 
teachers committed to using standardized tests as a means of ap- 
praising pupils* English skills should be informed also on such 
NCTE publications. An> deeisions about the purposes of testing in 
English and about the selection of a particular standardized test 
should be based upon some familiarity with these and other contri- 
butions by leaders in the field. The CounciTs publications cire evi- 
dence of a long and continuing mission of bringing to the profes- 
sion the best scholarly research available, representative points of 
viexv, and stimulating aeeounts of experiences in the teaching of En- 
glish. All eonstitnte an overview of the scope of English in the 
:>chools and create a much larger eonstruct of English than can be 
reflected in any standardized .csts of small parts ofjpupils* exper- 
iences with this subject. The Council's publications, cited in a chron- 
ological list of Seleeted References at the end of this Introduction, 
provide a background against which any uses of standardized and 
other forms of tc^ts. including the interpretation and uses of their re- 
sults, should be considered carefully. 

As the publications indicate^ the Council has been giving contin- 
ued attention to the nationwide insistence upon holding schools ac- 
countable to the public foi the elTeCtiveness of their educational pro- 
grams and tlietr teaching and the accompanying uses of standard- 



6 



INTRODUCTION 3 



ized tests in English. These publications^ including this one^ are an 
outgrowth, in part, of the concerns expressed in the following resolu* 
tions on accountability and on the use of standardized English tests 
passed b> theNCTE membership at the Annual Business Meeting in 
November, 1971* 

OnAccotuttability 

Background. English teachers recognize their accountability to 
various groups— to students, to colleagues both within and without 
the discipline of English, to parents, to the local community which 
supports the schoojs, and to the widtr communities beyond it. How- 
ever* they reject the view that their goals and objectives can be stated 
only in quantifiablj measurable terms, describing the behavior their 
students urill display at the completion of instruction. 

Moreover, just as important as the English teacher's accounta- 
bility to his students, to his colleagues, and to the communities 
which have a responsible interest in his activities, is the accounta- 
bility of each of these groups to him. Students are responsible for be- 
ing active participants in the learning process. Parents are respon- 
sible for supplying a nurturing environment and for being valued 
colleagues in developing appropriate learning programs. Ad- 
ministrators and others who provide the school climate are respon- 
sible for fostering the teaching process. The wider communities arc 
responsible for providing financial, cultural, and social support. Ii is 
now part of the English teacher's obligation to clarify for himself* his 
students, his colleagues, and his several communities how he can be 
accountable. Be it therefore 

Resolved, That the National Council of Teachers of English (1) de- 
scribe the diverse and appropriate ways it i:; possible to know that 
students are learning, and (2) recommend the most effective means 
of communicating this information as well as teachers* expectations 
about the responsibilities that students, parents, administrators, 
and the general public have to the educational program of the com- 
munity. 

0/1 the Use of Standardized Tests 

Background. Standardized tests of achievement in English and 
reading have been subjects of growing controversy. Some test norms 
were established long ago or were based on populations that do not 
resemble the population being tested. The contents of many tests, 
moreover, are widely regarded as culturally biased or pertinent to 



ERslC 



7 



4 ALFRED H.GROMMON:' 



outdated curricula. Moreover, many students who fail to demon- 
strate reading competency on standardized tests can and do read 
materials of interest to them. 

Clearly other measures than standardized tests are needed to 
evaluate achievement in language arts skills. These include locally 
prepared tests of language arts skills, surveys of students' reading 
habits, and evaluations by teachers who work daily with students. Be 
it therefore 

Resolved^ That the National Council of Teachers of English urge 
local school districts, colleges, and state agencies 

(1) to re-examine standardized tests of English and reading in or^ 
der to determine the appropriateness of their content to ac- 
tual instructional goals and the appropriateness of the test 
norms to students; 

(2) to study problems in the use and interpretation of these tests; 
and 

(3) to consider carefully means other than standardized tests, in- 
cluding student self-evaluation, of assessing the language arts 
skills of students. 

NCTE Commitiee to Review Standardized Tests 

The Committee on Research of the National Council of Teachers 
of English appointed an ad hoc committee to examine a wide sam- 
pling of English tests commercially prepared and published and 
readily available to teachers of English, and to prepare a report of 
reviews written by members of this NCTE Committee to Review 
Standardized Tests. The reviewers were asked to evaluate only the 
validity of the content of selected tests. In focusing attention upon 
the subject matter of each test, the reviewers were concerned with 
such questions as the following: 

(1) What do the format and items of a test reveal about the test- 
makers' underlying assumptions about and concepts of the 
English language, of readers* responses to literature, and of 
the learning and teaching of English? 

(2) How \alid are these assumptions, concepts, and items in the 
light of what is known now about the nature of the English 
language, about current acceptability of a variety of diaiects 
and usages in speaking and writing, about readers' responses 
to literature, and about teaching English in a pluralistic 
society? 



ERLC 



8 



INTRODUCTION 5 



The reviewers were not concerned with other features such as the 
national sampling of test^takers, the establishing and revising of 
norms, reliability, and means of interpreting results. 

Part One of this report offers a context for current testing some- 
what larger than that represented by an individual teacher's exper* 
tences with standardized tests or by a review of a particular test I 
have endeavored to clarify what seems to be meant by the term "edu* 
cational accountability** tl\at occurs in almost every statewide test* 
ing program; to identify what appear to be encouraging develop* 
ments in statewide programs of testing and assessment; and to point 
out some major problems in English tests, and tests in general, that 
English teachers should consider. The evaluation of a particular test 
should be considered also in the larger context presented in Part 
One. 

Part Two olTers 58 reviews, the major purpose of this report 
These represent evaluations of the content of 51 different tests. Tests 
of reading skills are not included, however. The introductions to the 
reviews discuss the kinds of tests examined and the relation of the 
segment of English tested to an entire program of English; identify 
some criteria applied to each test; or point out the purposes of the 
tests, their strengths and weaknesses, and kinds of tests needed. As 
is immediately apparent, each reviewer was free to make his own 
judgments and to write his reviews in whatever style he preferred. 

Part Three summarizes some of the problems in educational ac* 
countability and suggests specific thin^ that teachers can do. 

Professor Walter Loban, School of Education, University ofCali* 
fornia, Berkeley, generally discusses tests intended to produce infor 
mation about a child's developing competence in using the English 
language, from preschool through the elementary grades. Professor 
William A. Jenkins, Academic Vice-President, Florida International 
University and former editor of Elementary English^ reviews tests of 
grammar, usage, diction, punctuation, and spelling prepared princi^ 
pally for grades 4-9. Professor J. N. Hook, Professor of English and 
Counselor, Council on Teacher Education, Emeritus, University of 
Illinois, and the first Executive Secretary of the NCTE, reviews tests 
of grammar, diction, usage, punctuation, and spelling intended pri- 
marily for grades 9 12; some forms are also used in the junior high 
school. In addition, he reviews several tests containing sections on 
effectiveness of written expression; these reviews are included in the 
section on writing tests, introduced by the late Professor Richard 
Braddock, Department of English, University of Iowa. Professor 
Alan C. Purves, College of Education, University of Illinois, reviews 



9 



6 ALFRED RGROMMON 



tests of knowledge about, interpretation of, and responses to litera- 
ture. Professor Dan Donlan* University of California, Riverside, re- 
views one sequence of tests on literature. 

Although the NCTE Committee to Review Standardized Tests 
tried to make a rather thorough search of tests available to teachers 
of English, it makes no claim to having reviewed every English test 
described in publishers* catalogs* Instead, it selected a broad range 
of instruments that seemed to be in wide use in schools. 

The work of the NCTE Committee to Review Standardized Tests, 
now completed, will be extended through the charges given to the 
NCTE Task Force on Measurement and Evaluation in the Study of 
English and the NCTE Committ^: 'o Study the National Assessment 
of Educational Progress. These appointments further manifest the 
Councir>s involvement with implications of the expanding uses of 
assessment in English. 

I wish to thank Professors Hook, Jenkins, Loban, Purves, and 
Donlan for their many valuable contributions to this report. I wish 
to thank also Dr. James R. Squire, former chairman of the Commit* 
tee on Research, and Professor Purvcs, also a former member of the 
Committee, who originally suggested that reviews of English tests be 
published by the Council and who initiated, planned^ and nourished 
the project culminating in this report. 

Selected References 

1932 Leonard. Sterling A. Current English Usage* English Mono- 
graph No. 1 of the National Council of Teachers of English. 
Chicago: The Inland Press* 

1938 Marckwardt, Albert H*, and Walcott, Fred G.Fam about 
Currettt English Usage* English Monograph No. 7 of the Na- 
tional Council of Teachers of English* New York: D. Ap- 
plet on- Century Company. 

1940 Fries, Charles C* American English Grammar. English 
Monograph No* 10 of the National Council of Teachers of 
English. New York: D. Appleton-Century Company. 

1946 Poofey, Robert C Teaching English Usage* English Mono- 
graph No. 16 of the National Council of Teachers of English. 
New York: Appleton-Ccntury-Crofts, Inc. 

1952 NCTE Commission on the English Curriculum. TheEnglish 
Language Arts* Curriculum Scries Vol I. New York: Apple- 
ton -Century-Crofts, Inc* 



ERLC 



10 



INTRODUCTION 7 



1954 NCTE Commission on the English Curriculum, Language 
Arts for Todays Children* Curriculum Series Vol IL New 
York; A ppleton*Century* Crofts, Ine, 

1956 NCTE Commission on the English Curriculum, TheEnglish 
Language Arts in the Secondary School Curriculum Series 
Vol III New York; Appleton^Century^Crofts^ Inc, 

1963 Braddock* Richard, LloydJones^ Richard^ and Sehoer^ 
Lowell Research in Written Composition. Urbana^ 111: 
National Council of Teachers of English, 

1963 Loban, Walter D, The Language of Elementary School Chih 
dren. Research Report No, 1, Urbana, III; National Coun- 
cil of Teachers of English, 

1965 NCTE Task Force on Teaching English to the Disadvan- 
Language Programs/or the Disadvantaged. Edited by 
Richard Corbin and Muriel Crosby, Urbana, III: National 
Council <;f Teachers of English, 

1965 Hunt, Kellogg W, Grammatical Structures Written at Three 
Grade Levels. Research Report No, 3, Urbana^ III; National 
Council of Teachers of English, 

1965 Judlne, Sister M,, I,RM, A Guide for Evaluating Student 
Composition. Urbana^ 111: National Councii of Teaehers of 
English, 

1966 Bateman, Etonald R,> and Zidonis, Frank J, The Effect of a 
Study of Transformutional Grammar on the Writing of 
Ninth and Tenth Graders. Research Report No. 6, Urbana, 
III: National Council of Teaehers of English, 

1966 Committee on the National Conference on Research in En* 
glish. Research on Handwriting and Spelling. Edited by 
Thomas D, Horn, Urbana^ III: National Council of Teach* 
ers of English, 

1966 Loban, Walter, -ProWe/»5 in Oral English. Research Report 
No, 5, Urbana, III: National Council of Teachers of 
English, 

1967 Board of Education of the City of New York, Nonstandard 
Dialect. Urbana, III; National Council of Teachers of 
English, 

1967 O'Donnell Roy C„ Griffin, William Jm and Norris, Ray* 
inond C, Syntax of Kindergarten and Elementary School 



ERIC 



8 ALFRED a GROMMON 



Children. A Transformaiiomil Analysis. Research Report 
No. 8. Urbana, 111.: National Council of reactiers of 
English. 

1%8 Petty, Walter T., Heroid, Curtis P., and StoU, Earline. The 
State of Knowledge about the Teaching of Vocabulary. Ur- 
bana, III: National Council ofTeachersofEnglisK 

J%9 Mellon, John C. Transformational Sentence^Combintng: 
A Method for Enhancing the Development of Syntactic FU- 
ency in CngHsb Composition. Research Report No. 10. Ur- 
bana. 111.: National Council of Teachers of English. 

1%9 Sherwin, J. Stephen. Fottr Problems in Teaching English: 
A Critique of Research. Scranton, Pa.; International Text* 
book Company for the National Council of Teachers of 
English. 

1970 Labov, William. The Study of Nonstandard English. Ur* 
bana. 111.: National Council of Teachers of English for the 
Center for AppUed Linguistics. 

1970 NCTE Commission on the English Curri^^ulum. On Writing 
Behavioral Objectives for English. Edited by John Maxwell 
and Anthony Tovatt. Urbana, 111.: Ni^Uonal Council of 
Teachers of English. 



1971 NCTE Commission on Composition. The Student s Right to 
Write. Urbana, 111.: National Council of Teachers of 
English. 

1971 O'Hare, Frank. Sentence Combining. Improving Student 
Writing wititota Formal Grammar Instruction. Research 
Report No. 15. Urbana, 111.: National CounciKof Teachers 
of English. 

1972 NCTE Commission on the English Curriculum. Accounta- 
bility and the Teaching of English. Edited by Henry B. 
Maloney. Urbana, IlK: National Council of Teachers of 
English. 

1972 Purves, Alan C. and Beach, Richard. Literature and the 
Reader: Research in Response to Literature. Reading hr 
terests. and the Teaching of Literature. Urbana, IlL: Na* 
ttonal Cou ncil of Teachers of English. 

1973 NCTE Commission on Reading. Accotmtability and Read- 
ing Instruction: Critical I ssties. Edited by Robert B. Rud- 
dell. Urbana, 111.: National Council ofTeachers of English. 



ERIC ' 12 



INTRODUCTION 9 



1974 Cullinan, Bernice E Black Dialects and Reading. Urbana, 
III: ERIC Clearinghouse on Reading and Communication 
Skills and National Council of Teachers of English, 

1974 Diederich, Paul — leasuring Growth in English, Urbana, 
IlL: National Cou..viI of Teachers of English. 

1974 Pooley, Robert C, The Teaching of English Usage. Urbana, 
III; National Council of Teachers of English, 

1974 Uses, Abuses. Misuses qf Standardized Tests inEnglish: A 
First' Aid Kit for the Test-Wounded. Urbana, III: National 
Council of Teachers of English, 

1975 Fagan, William T,, Cooper, Charles R., and Jensen, Julie M. 
Measures for Research and Evaluation in the English Lan* 
guageArts. Urbana, III; ERIC Clearinghouse on Reading 
and Communication Skills and National Council of Teach* 
ers of English. 

1975 NCTE Task Force on Measurement and Evaluation in the 
Study of English. Common Sense in Testing. Urbana, 111: 
National Council of Teachers of English. 

1975 Mellon, John C, National Assessment itnd the Teaching of 
English. Urbana. Ill: National Council of Teachers of 
English. 



ERLC 



13 



Statewide AccoantablUty Programs of Testing and Assessment 

Alfred H. Grommw 



^^Accountability! Accoiintabilityr "When 1 use a word/' says 
Humpty Dumpty, '*it means just what 1 choose it to mean— neither 
more nor less/' And apparently so say or imply some persons in state 
legislatures, state departments of education* and local communities 
concerned with finding out for some purposes, somehow, something 
about the efTecliveness of programs and teaching in schools for 
which they feel responsible or in which they have involvement as par- 
ents and taxpayers* However educational accountability may be de- 
fined or implied, efforts to hold schools accountable to the public 
and to outside agencies are now nationwide* 

The Educational Testing Services (ETS) study f State Educational 
Assessment Programs. 1973 Revisiorit reports that each of the fifty 
states, the District of Columbia* Puerto Rico» and the Virgin Islands 
either had a statewide educational assessment program already in 
force or had one in the planning stage/ ETS also made a follow-up 
study. State Testing Programs* 1973 Revtsiott, of iti 1%8 survey and 
found that in 1972-73, thirty*three states had forty*two statewide 
testing programs functioning* and additional programs were being 
planned.^ Unquestionably, mandated programs of educational ac* 
countability. in some form, are important components of inquiries 
into what states are getting for their educational investment 

These programs are not trouble-free. To an increasing number of 
teachers, '*accountability" is a threatening concept and term* 
Teachers' protests against the accountability movement in education 
seem to be multiplying* The ETS surveys disclose that a rather com* 
mon problem experienced by administrators of several programs is 
the need to reckon with teachers* negative attitudes toward manda* 
tory uses of standardized tests> especially where the results may 
prove disadvantageous for teachers, their students, and their 
schools. According to The New York Times (July 6, 1974), many of 

13 



14 ALFRED RGROMMON 



the 10>000 members of the National Education Association (NEA) 
attending a convention in Chicago expressed their determination "to 
fight accountability unless the teachers themselves have a key role in 
the establishment of plans." Terry E Herndon> executive secretary 
of the NEAr then having a membenhip of 1,400*000* was quoted as 
saying, **Teachen do not — and will not — accept this simplistic, bu- 
reaucratic approach to teacher accountability that is prevalent in 
America today/' He suggested that "teachen ought to refuse to give 
tests that are not found acceptable/* The Times further reports, 
"Teacher antagonism toward tests has risen to such heights that the 
National Education Association has called for a moratorium on all 
group standardized intelligence, aptitude and achievement tests/' 

The question of acceptability of standardized tests now being used 
in English classes has long been a concern of members of the Na- 
tional Council of Teachers of English. This report is one outgrowth 
of the Councils commitment to bring to theprofession information 
pertinent lo urgent educational problems encountered by teachers of 
English. Many are deeply concerned about statewide and local pro- 
grams that require the use of standardized English tests as a means 
of presumably identifying achievement levels in limited aspects of a 
complex subject- The major purpose of this report is to offer teach- 
ers and administraton evaluations of many published English tests 
and to offer an extended context* about tests and programs of test^ 
ing and assessment, in which the nature and uses of objective mea- 
sures may be considered. 

Teachers' participation in platming and administering any system 
of educational accountability is indeed important — and will be com^ 
mented upon later in this report — but questions arise about vtrhat 
kind of system they might be participating in. For what purposes are 
programs of accountability designed? In communities .^ii:>isting of 
diverse socioeconomic and cultural constituencies^ for whose benefit 
are the programs planned and the results to be used? What effect, if 
any, will test results have upon state and local decisions on educa- 
tional policy and upon the status of teachers? 

According to Henry M, Levin, the extensive literature of educa- 
tional accountability indicates ''four relatively distinct concepts of 
accountability: (a) as performance reporting; (b) as a technical 
process, (c) as a political process; (d) as an institutional process,"* 
The kind of accountability considered here is mainly performance 
reporting, 

Results of standardized English tests, along with those of other 
kinds being used statewide and in local school districts, are reported 

15 . 



ACCOUNTABILITY PROGRAMS OF TESTING AND ASSESSMENT I5 



to such constituencieb as governors, state legislatures, state depart- 
merits of education, local school districts and communities^ teach* 
ers, and students. Inquiries into the purposes of performance report- 
ing, the relation of accountability to educational goals, the processes 
by which statewide goals were established — by whom and for whom* 
the uses of test results, and the kinds of involvement of and consider* 
ation given to diverse groups in the community, all such inquiries 
would seem to lead inevitably into aspects of technical, political, and 
institutional processes. Inquirers should not overlook positive as- 
pects of plans. According to Frederiek McDonald, director of Edu- 
cational Studies at ETS, '^Accountability is too frequently defined in 
negative terms, with too much emphasis on its punitive interpreta- 
tions.'* As reported in ETS Developments, McDonald sees '^account- 
ability a: the acceptance of rvsponsibib'ty for comequeuces by those 
to whom citi/cns entrust the performance of certain pubhc services. 
Thus, in its educational context, au accountability system's primary 
purpose is to promote student development."* Consequently, teach- 
ers' participation in any program of educational accountability sure- 
ly should not be limited to ensuring the selection of acceptable tests. 
Rathen they should contribute to the entire scope of the program. 

Educational accountabiUfy involves "testing'' and "assessment.** 
Although these terms are used interchangeably in this discussion, 
the distinctions made in the ETS surveys should be clarified, espe* 
daily for teachers concerned about their roles in accountability sys- 
tems. Standardized tests are basic to all statewide testing programs 
and usually constitute the only kind 6f measurement provided by the 
state. While the use of standardized tests may be part of some as- 
sessment programs, this use does not circumscribe the scope of an 
assessment program, since assessments are designed to explore a 
wide range of educational needs and services throughout a state's 
publie schools. 

Henry S. Dyer, while vice president of ETS, stated* **You can have 
assessment of educational programs and services without any testing 
at all. You can also have testing without any assessment of educa* 
tional programs and services.'** An Oklahoma report, included in 
the previously cited ETS 1973 assessment survey, also identifies 
some confusion about these concepts and further clarifies the dis- 
tinctions made in. that state: 

One major problem developed and continues: the difficulty en* 
countered in convincing the uninformed (including test makers 
and the testing company personnel) that needs assessment is not 



RLC 



16 



16 ALFRED H.GROMM ON 



merely getting results on standardized tests or criterion-refer- 
enced test items in the academic or nonacademic areas» but in- 
cludes the whole range of human needs which may or may not re* 
late to the way our schools are currently operating/ 

Because the ETS 1973 assessment survey and the Oklahoma report 
arc referred to frequently in the following discussion^ it seems essen* 
tial to clarify here the distinctions ETS and many states make be- 
tween the limited nature and purposes of instruments used in state- 
wide programs of testing and the more extensive characteristics of 
assessment. 

The majority of statewide programs of testing and assessment in- 
clude efforts to measure the results of the teaching of English^ and to 
some extent the effectiveness of the teachers and of their English 
programs. Thirty-three states^ now conducting statewide as- 
sessments of educational needs or planning to do so, are or will be 
assessing the results of their school programs in what are identifled 
as language arts, English, grammar, spelling, punctuation, compo- 
sition or effectiveness ofwritten expression, literature, and speaking. 
In addition, forty-seven states are assessing children's needs in read- 
ing. Moreover, the ETS 1973 testing survey shows that thirty-two 
programs in twenty-seven states also include tests of English and 
writing* and thirty-two programs in thirty states test reading skills. 
Consequently, there seems to be no way in which English teachers 
can avoid being involved with some form of testing, either as a part 
of state\\ide programs or of those in local districts or individual 
schools. 

Accoantabiltfy and Statewide Progtams 

Uses of any kind of standardized or locally prepared tests should be 
considered within the larger context of what is generally called ^*ac- 
countability.** As indicated earlier, all the states, the District of 
Columbia, Puerto Rico, and the Virgin Islands are now either ad- 
ministering programs of testing and assessment or planning to insti- 
tute them. All systems ate intended to produce information aboiit 
aspects of education in public schools for which the local com- 
munity, state department of education, or the state legislature is 
holding the schools accountable, in one form or another* 

In his article* Evaluation^ DecisiotrMaking, and AcQOuntabitity^ 
Garlic A* Forehand differentiates between evaluation and accounta- 
bility and their relation to the making of educational decisions. In 



17 



ACCOUNTABILlTYPROGRAMSOFTESl.NG AND ASSESSMENT 17 



his view, evaluation "consists of gattiering, processing, and using in- 
formation about each element [in the process of curriculum develop- 
ment: goalSr procedures, implementation, feedback]. It takes place 
throughout the cycle. Evaluation is an ongoing activity, rather than 
a single one-shot study, an aid to decision- making rather than a 
summary judgment, a rational procedure rather than a routinized 
program/* Comparing and contrasting evaluation and accountabil- 
ity. Forehand states: 

Now, in the early 1970's, evaluation has acquired a still different 
and still new guise, which goes by the name of accountability. 
Many of the characteristics and techniques of the other concepts 
of evaluation are retained in this newer version. The value of a 
program is to be measured by its effects on the performance of 
students. The technique of defining behavioral objectives — educa- 
tional goals translated into objectively observable behaviors — is 
used to assess goals and their attainment. Resui^ are to be as- 
sessed in comparison to predefined standards. The main differ^ 
ence between accountability and other concepts of evaluation lies 
in the relationship posited between the public (usually as repre- 
sented by political bodies) and the educator Accountability as- 
sumes the relationship to be a contractual one. ... In general, at 
least in the discussions of accountability' most commonly encoun- 
tered thus far, the methods for achieving the objectives are not 
part of the contract* save, by assumption, in unstated ethical stric- 
tures. As compared to other concepts of evaluation discussed 
here, accountability is relatively far removed from considerations 

of method, theory, hypothesis, and concept 

Traditional evaluation for course improvement is carried out by 
the development team, emphasizing formative questions for the 
purpose of revising procedures. Evaluation of new curricula may 
be carried out from any perspective; it generally emphasizes sum- 
mative evaluation for the purpose of making decisions about 
adoption or support. Accountability generally takes the perspec- 
tive of the public — concentrates on the evaluation of end prod- 
ucts* and is conducted for the purpose of learning the extent to 
which the teacher or school has met obligations. ' 

The central importance of the public's invoKement in programs of 
testing and assessment is plainly stated also in the ETS 1973 assess- 
ment survey: ''Accountability is the heart and soul of most assess- 
ment programs. More importantly^ state education agencies in evety 



ERLC 



18 



18 ALFRED H.CROMM ON 



state in the union are taking the leadership In helping or coercing 
school administrators to answer to the public's cries for better infor* 
mation about what children know and how well schools are doing 
their job." This analysis of thestate's role in programs of accounta* 
bility continues: 

The issue should not be one of state*imposed accountability 
versus locally initiated accountability. State education agencies 
and state legislatures have their ow n reasons and suffer their own 
pressures for collecting information about students' educational 
achievements. School districts' accountability to the state should 
not be confused with school districts' accountability to their own 
communities or with teachers' accountability to their own school 
systems. 

The issue should be whether state*imposed accountability sys- 
tems encourage or discourage school administrators and teachers 
from developing their own accountability plans. 

It can be said that accountability laws are the signs of the pub* 
lie's lack of faith in the effectiveness of schooling* It can be said^ 
also, that accountability laws are the signs that school officials did 
not> or could not, respond on their own to accountability de- 
mands.^ 

Because accountability to the public about schools' effectiveness is 
altogether too general a statement of purpose for statewide testing, 
ETS questionnaires included items directed at statements of more 
specific purposes and uses of results of tests. The following sum* 
maries of responses to questions are taken from the ETS State Test* 
ing Programs, 1973 Revision (pp, 2-3,7): 



Question! What is the major purpose of the program? 
(42 programs in 33 states responding) 







Program 


States 


]. 


Instructional evaluation 


27 


23 


2. 


IdentiHcation oflndlvidual 








problems and talents 


23 


19 


3. 


Guidance 


22 


20 


4. 


Provide data for a management 








information system 


14 


14 


5. 


Placement and grouping 


14 


13 



Question 19. How are the results of the program used? 
(42 programs in 33 states responding) 



ERIC 



19" 



ACCOUNTABIUTY PROGRAMS OF TESTING AND ASSESSMENT 19 



Program States 

1. Instruction 28 24 

Z Program evaluation 26 22 

3. Program planning 26 23 

4. Guidance 23 22 

5. Comparative analysis across 

schools 14 13 

The moit common goal is to evaluate instruction. When asked about 
uses gf dttta resulting iVoni tests* the states indieate* again* that the 
most conmion use is to evaluate the quality of instruction and of 
educational programs in public schools. And yet, ETS found in its 
survey of Shitv Eilttcotiomil /liit'ii//icii/ Progmmi^, 1973 Revisioit.ip. 
7)« that onl> Pennsylvania reported collecting information about 
tciithers* nictlujds of instruetion. Moreover, Forehand*s statement 
f.|noted earlier indicates that* based upon current discussions of ac* 
eountabilit), **aeeountabilit> is relatively far removed from consider- 
ations of method, theory, hypothesis, and concept" 

ETS also inehided in its state tcstpg survey (pp. 2*3) an inquiry 
about methods the states use to Infoxrii local schools about test re< 
suits aud the interpretation of them: 

Question 20. What efforts are undertaken to assist local 
interpretation and use of program results? 
(42 programs in 33 states responding) 





Programs 


States 


1. Workshops 


31 


30 


2. Consulting 


26 


25 


3. Publications 


24 


21 


4. Audiovisual aids 


II 


11 


5. Nothing 


5 


2 



Question 22. For whom is this assistance provided? 
(37 programs in 33states responding) 







Programs 


States 


1. 


Administrators 


32 


29 


2. 


Classroom teachers 


26 


25 


3. 


Guidance counselors 


25 


25 


4. 


School boards 


10 


10 


5. 


Comniiinity groups 


8 


8 


6. 


PTA 


6 


6 


7. 


Students 


5 


5 



ERIC 



20. 



20 ALFRED H.GROMMON 



Evidently* most states provide local school administrators* teachers, 
counselors, and board members with information about and impli- 
cations of the results of statewide tests. Yet little seems to be done to 
provide the same kinds of information to nonschool members of lo- 
cal communities* even though a major feature of educational ac* 
countability programs is the reporting of information about local 
schools to the local community as well as to the larger coHimunity. 
The following quoted question and comment also are related to the 
pervasive problem of communication: 

Question 24. Who receives a copy of the program reports? 
(40 programs in 33 states responding) 



Only seven programs in seven states report that parents are given 
reports of results and only six programs in six states distribute re- 
ports to the general public . . . most often only upon request. Ty- 
ing this with the information from question 22, one can conclude 
that little assistance in the interpretation and use of program re^ 
suits is provided for nonprofessional members of the community 
and the results of programs are not often shared with these indivi^ 



In addition, about twenty of the states operating statewide assess^ 
ment programs send reports !o the Educational Resources Informa* 
tion Center (ERIC). 

Although many states evidently intend to report generally the re- 
sults of their testing programs, the fact is that those most directly in- 
volved in educational programs — students, teachers* and parents — 
rarely receive test results. Despite the proclaimed top priority given 
in responses to questions 2 and J 9 — the evaluation and improvement 
of instruction and programs — teachers learn of only half of the re* 



Programs States 



1. Schools 

2. School districts 

3. State Education Agency 

4. Students 

5. Principals 

6. State Board of Education 

7. Teachers 

8. Colleges or universities 

9. Newspapers 

10. Governor or Legislature 



31 28 

27 25 

23 21 

20 14 

17 16 

16 16 

15 14 

12 11 

11 11 

8 8 



duals. ^ 



21 



ACCOUNTABILITY PROGRAMS OF TESTING AND ASSESSMENT 21 



suits reported to schools and school districts. In many stateSt parents 
get reports only by requesting them. 

The primary emphasis given by many states to tests as a means to 
evaluate instruction was of some concern to the ETS survey staff be- 
cause this objective may be based upon questionable assumptions: 

It IS probably safe to say that statewide assessment will not pro- 
duce any startling revelations about what can be done by teachers 
with pupils to help children learn more effectively. The conclusion 
IS not meant to be as much an indictment of statewide assessment 
as it is a statement of its limitations. Revelations in teaching prac- 
tices and methods can come only from intensive analysis within 
each school building and within each classroom. If statewide as- 
sessment data can whet the appetites of teachers and administra- 
tors for doing the kinds of evaluation only they can do for them- 
selves, statewide assessment will serve its purposes well** 

To draw attention to the importance of helping schools make local 
decisions, the ETS staff classified state assessment programs into 
three groups: the seventeen programs designed to collect informa- 
tion to be used in making decisions at the state level; the thirteen 
programs designed to collect information to be used mainly to help 
local districts make decisions; and those programs only beginning to 
emerge in twenty- four otherstates. 

R. E. StakCj a writer on laws of educational accountability, also 
questions the effectiveness of such laws in improving the quality of 
education in local schools: 

Most state accountability proposals call for more uniform stan- 
dards across the state, greater prespecification of objectives^ more 
careful analysis of learning sequences and better testing of stu- 
dent performances. . . . state acconntabiUty taws are to be in 
the best interests of the people, they should protect local control 
of the schools, individuality of teachers, and diversity of learning 
opportunities. They should not escalate the bureaucracy at the 
state or local level. They should not allow school ineffectiveness to 
be more easily ignored by drawing attention to student perfor- 
mance. They should not permit test scores to be overly influential 
in schoolwide or personal decisions — the irreducible errors of test 
scores should be recognized. The laws should make it easier for a 
school to be accountable to the community in providing a variety 
of high quality learning opportunities for every learner." 

Whatever educational benefits result from statewide testing de- 
pend mainly upon the awareness of local school and district ad- 



id 

ERLC 



22 



22 ALFRED H.CROMMON 



ministrator^, an a\iarcncss Mut thc> arc being held accountable to 
teachers — as is stated in the NCTE resolution on accountabil- 
it> — and for sharing rest rcsalts with the teachers and students. If 
the qualtt> of programs and instruction is to be thereby improved, 
then administrators must provide teachers and students with some 
interpretations of results aud sonic local implications. Moreover, 
the> must ensure the in\oKemcnt, in a major way, of their teachers 
in an> consequent inser\icc activities intended to improve programs 
and instruction. 

Furthermore, teachers as well as administrators should partici- 
pate in the preparation of any reports to the local community. En- 
glish teachers, for example, can make sure that test results are ap- 
propriately related to particular parts of their English courses and, 
cvcu more iniportautlj, that tests are measured against the context 
of the entire English program in local schools. In so doing, English 
teachers can awaken the community to another aspect of accounta- 
bility basic to that NCTE resolution: the relationship of students' 
performances to the larger educational environment resulting from 
the degree of awareness that students, parents, local and wider coni- 
nutnitics seem to have of their being, in turn, accountable to teach- 
ers in their schools. 



Trends In Statewide Programs 

Lest the preceding discussion seem to emphasize unduly the negative 
features of educational accountability and of statewide programs of 
testing and assessment, attention should be given to the positive as- 
pects reported in the ETS surveys. Some merits of statewide pro- 
grams emerge from a consideration of them in a context larger than 
an individual teacher's experiences with administering standardized 
tests in the classroom. 

For the sake of assessment, states had to formulate educational 
goals, applicable statewide. Consideration had to be given to such 
questions as: 

What learning should be achieved by the full range of pupils 
throughout the public schools in the entire state? 

Who should determine what each child's goals should be? 

For uhat purposes and by what means are each child's ednca^ 

tionat needs and achievements to be assessed? 

How are the results of assessment to be used? 



ERIC 



23 



ACCOUNTABiLlTY PROGRAMS OF TESTING aND ASSESStv!E!^T 2j 



According to the ETS 1973 survey of state assessment programs, 
forty-three states already had established educational goals; the 
other seven were in the process of doing sa These goals, ranging 
from one to seventy-etght in the various states, were elassified into 
three groups by ETS: (j) goals tdeniifying desired outcomes for the 
learner; (2) those related to such processes as having students and 
other citizens participate in developing curricula; and (3) those in- 
volving such institutional matters as standard:» for personnel, teach- 
ing materials, and educational programs. Apparently educational 
authorities in many states had not engaged previously in an exercise 
common to teachers: identifying and writing objectives. But the mo- 
ment a state deeided to establish programs of assessment, the first 
obligatory step was the formulation of educational goals acceptable 
to a representative segment of concerned people in the state. 

Another positive aspect evideneed by the goals of, and tests used 
in, assessment programs is that most states no longer consider 
schools and the individual teacher responsible for teaching solely the 
traditional three "Rs/' Instead, many goals represent an explicit 
concern with the individual pupil's physical, emotional social, and 
intellectual development. The ETS survey provides considerable evi- 
dence of the extensive efforts to ascertain what is happening in the 
affective, noncognitive domain of pupils* experiences in school. In 
some states, in fact, the emphasis upon the affective domain is so 
pronounced that some people in those states have inferred that per- 
haps the cognitive, traditional skills arc not receiving enough atten- 
tion. The ETS staff grouped the objectives taken from the states' 
goals into the following categories, which indicate clearly the em- 
phasis now being given to the affective, domain of educational txp^r- 
hncQsiAssessment Programs f p. 6): 

1. Basie Skills 

2. Cultural Appreciation 

3. Sell-realiifation 

4. Citizenship and Political Understanding 

5. Human Relations 

6. Economic Understanding 
'7. Physical Environment 

8. Mental and Physical Health 

9. Creative* Constructive and Critical Thinking 

10< Career Education and Occupational Competence 

IK Lifelong Learning 

12. Values and Ethics 

13. Home and Family Relations 



24 



24 ALFRED a CROMMON 



To these categories^ add the following from the ETS testing survey 
[Testing Programs, p, 5); 

Question IJ. Which of the follomng noncognitive areas 
are being tested? 
(9 programs in ^ states responding) 

ELEMENTARY SECONDARY 
6 6 5 5 

Programs States Programs States 

]. Attitudes 

toward school 5 5 2 2 

Z Self-concept 4 4 11 

3, School plans 

and aspirations 2 2 11 

4, Interests 114 4 

5, Biographical 

data _ 2 2 

For English teachers whose only encounters with standardized 
tests may be with those aimed at pupils' abilities to punctuate, capi* 
talize^ and spell, these examples of widespread attempts to assess 
noncognitive experiences and attitudes may be encouraging, Evi^ 
dent, too, are the many opportunities English teachers have to con* 
tribute significantly to helping pupils attain affective goals. The in* 
creasing sophistication in as:>essing elements of pupils' affective ex* 
periences in school may lead to improved instruments — some ere* 
ated by the teachers themselves— for helping teachers and students 
explore aspects of noncognitive learning. Teachers who have reser- 
vations about the relation of statewide programs to pupils* affective 
experiences in English programs should inquire about the goals of 
their own state's report, 

A third encouraging trend in states' development of educational 
goals is the use of citizen advisory groups, teachers, and students; 
the use of statements prepared by citizens in statewide meetings; 
and the use of research on why and how students learn. Connecticut 
reports, for example, that ''thousands of citizens'* contributed to the 
development of goals for its schools. Kansas reports that in addition 
to the participation of students, lay citizens, and professional con* 
sultantst "approximately 8,0CX) teachers** contributed to the estab* 
lishment of goals* one of which stresses the "need to involve lay 
citizens and students in the planning of the school's curriculum/' 
Approximately 120>000 people in Ohio reviewed and refined the 



^ or- 



ACCOUNTABILITY PROGRAMS OFTESTING AND ASSESSMENT 25 



goals for that state- Other states drawing upon citizens* teachers* 
and students include Florida, Georgia, Nebraska* New Jersey, 
Rhode Island, Utah, and Wisconsin. 

To facilitate the involvement of citizens, Califcrnia published a 
three-volume guide. Education for the People, to be used in local 
communities. The guide presents models of how community repre* 
sentatives can participate in creating educational goals appropriate 
to Schools in their community. During 1974, the community goals 
from throughout the state were to become the basis for establishing 
an approved set of goals for the entire state. 

Thus, the states' decisions to embark upon programs of educa- 
tional accountability led to their establishment of statewide goals 
and then of design instruments to identify what the schools were 
contributing to pupik' progress toward reaching these goals. In the 
process, state boards of education and other educational authorities 
drew upon recommendations made by thousands of teachers and 
students; labor, business, and professional representatives; and 
specialists from universities, colleges, and community colleges. 
Whatever eflbcts statewide programs of testing and assessment may 
have otherwise produced throughout the schools, the states should 
* be credited with accelerating widespread participation of teachers, 
students, and lay citizens in creating a range of cognitive and affec- 
tive goals for all pupils in public schools. 

A fourth positive development reflected in the ETS surveys is that 
an increasing number of states are either already using standardized 
tests **tailor*made" to fit identifiable circumstances or are in the 
process of creating such tests. This relates directly to the teachers* 
reservations about, and objections to, the use of commercially pre* 
pared standardized tests. Many doubt that ^ny standardized test can 
fit thecii\'umstances of an individual child, or indeed the character* 
istics of a particular class or community, and can measure effectively 
a pupil's English language skills and his or her relationships with Ht* 
erature. 

The problem is stated explicitly in the ETS assessment report on 
the program in Alaska, Alaska Educational Assessment and Model 
of Reasonable Expectation, and reflects an enlightened point of 
view: 

Alaska has elected to design their first statewide assessment of 
student skills with tests that are culturally free in relation to con* 
tent and linguistically equivalent in relation to question wording. 
There are 30 school districts in the state, one of which is under 



ERIC 



26 



26 ALFRED H.CROMMON 



state control. This district consists *of 126 rural villages in which 
seven difTerent Eskimo dialects are spoken in addition to three 
major Indian dialects and standard American English. The issue 
\h further compounded in that students range from those who are 
genuinely bilingual to those ^^ho are nonlingual (that is, they do 
not have an adequate command of either the ancestral language 
or the English' langitageX^^ 

Does any standardized test nott exist that is appropriate to language 
abilities and educational needs of these diverse Alaskan pupils^ their 
teachers, their school, and their communities? 

The E'rS 1973 testing survey shows that of the "41 programs in 32 
states, 20 programs in 20 states use only tests purchased *as is* from 
test publishers." But of importance here is that ''thirteen programs 
m seven states use only tests which have been tailor-made^ and in 
eight programs in eight states, a combination of purchased tailored, 
or revised measures is used.'' In response to the question about who 
developed these tailor*made tests, the respondents for 21 prcgrams 
in 13 states reported the following, in descending order of fre- 
ijueney: state education agency, committee of professionals, college 
or university^ test publisher, and outside contractor.*^ 

In Part Three of this book, the process used in California to facili- 
tate the development of tailored tests for English will be described as 
an example of steps other communities may wish to consider in 
planning procedures for developing or acquiring tests suited to their 
educational goals and to the learning environment and styles of their 
pupils. 

Present information indicates, then, that either as a result of, or 
concurrent with, the mandating of statewide programs of testing and 
assessment, state and local communities are striving to obtain or 
create tests especially suited to local circumstances. Moreover, fur- 
ther evidence emerging from the ETS surveys indicates that this 
trend certainly will continue. In responding to the question about 
which of nine elements in their prcgrams are most likely to change 
in the near future, representatives of 29 prcgrams in 26 states re- 
ported that the tests now being used are the most likely to change.'^ 

Standardized or norm- referenced tests certainly have their place 
in a program in which the student, parent, teacher, school, district, 
and state want to knm ::ow an individual student, class, school, or 
district compares in certain educational knowledge and skills wilii 
other students itt the nation. It is useful also to know (a) how a stu* 
dent's knowledge and skills at a particular time compare with an 



ERIC 



27 



ACCOUNTABILITY PROGRAMS OF TES'HNG AND ASSESSMENT 27 



earlier stage in the program or course and (b) Iiow the student's cog- 
nitive skills and knov^tedge — and perhaps noncpgnitive develop-^ 
ment — are related to specific educational goals of the teacher, 
school, or state. 

To get such information, a teacher usually tailor-makes tests 
based upon specitlc purposes, content, skills, and affective elements 
of the student's immediate experiences in that class. In this sense, 
the teacher is creating a criterion-referenced test- Each item in the 
test probably is related directly, or referenced to, specific aspects of 
the.content or skills taught in a preceding block of experiences or 
time. The criteria the teacher is likely to use to judge the student's 
performance will be based on the degree touhich specific goals are 
reached- The standard might be the student's previous performance 
or a comparison with the performance of other students in the class. 
For these purposes of evaluation, the national norms of norm-refer- 
enced tests would be irrelevant. 

A tlfth trend that should be reassuring to English teachers indi- 
cates that many states have replaced conventional standardized 
norm -referenced measures with criterion-referenced tests, some 
states are in the process of doing so, and others are using criterion- 
referenced tests to supplement norm*referenced ones. For example, 
Alaska reported to ETS that all tests are "being tailor-made for 
Alaskan students'' and that all tests will be criterion-referenced; 
Arizona reported that it is planning to shift from norm-referenced 
tests to criterion-referenced ones; and Colorado states that '*all cog- 
nitive tests are critcriun -referenced measures.** Several other states 
indicated that they have adopted criterion-referenced tests or are in 
the process of sodoing. 

The preceding discussion has been an attempt to identify the dis- 
cernible trends in statewide assessment programs. Such trends may 
case some teachers' concerns about the uses of standardized tests 
and about the future prospects of accountability. Understandably, 
many teachers do object to the uses of standardized English tests as 
a means of holding them and their schools accountable to outside 
agencies and to the public. Some. arc suspicious of the motives of 
those responsible for programs of testing and accountability, espe- 
cially when rcsulti* arc to be fed into a state planning-programming- 
budgcting system (PPBS) and then become a factor in the allocation 
of funds. Many feel pressured by these measures and the concom- 
itant requirement that they reduce educational purposes to quantifi- 
able behavioral terms. 

In New York City, however, some teacher:, idt that the accounta- 



ERIC 



28 



28 ALFRED H.CROMMON 



bitity plan would provide tliem with protection against unfair 
charges about the effectiveness of their teaching. In September 1974, 
the New York City Public Schools began to administer, in two or 
three selected schools in each of the city's thirty^two decentralized 
community school districts plus certain high schools, a ''pioneering** 
system of measuring the eDeetiveness of these sehools* The system 
had been developed over a three-year period under a eoiitract be- 
tween the city schools and ETS. According to school Chancellor Ir* 
vingAnker, as reported in TheNew York Times (April 9, 1974), this 
system of school accountability ultimately will be administered in 
the city*s 950 schools, attended by MOO.OOO pupils. Mr Anker said 
also that the system '*will make it possible to compare the perfor- 
mance of scliools that are operating Under comparable conditions.*' 
Such a project of accountability resulted, aecording to The New 
York T//Hf'5(Jiily 6. I974), from a "provision in the I%9 contract be- 
tween the Board of Education and the United Federation of Teach* 
ers. The union had insisted on the provision, according to the 
U.RT., *as a protection for teachers/ " Furtlier evidence of the 
U.F.T/s attitude toward city-wide testing is given in a then current 
issue of the union's newspaper, The New York Teachers. Sandra 
Feldman, director of the U.F.T. staff, stated that the plan of ac- 
countability to be introduced in September 1974, would help teach- 
ers "because it is going to help us identify our own effectiveness and 
give us the concrete proof we need that schools and teachers can do a 
job if resources are provided." The plan would help **exannne all of 
the factors that affect learning: separate out the socio*economie ef- 
fects on which the schools have no influence at all; and Hnd out what 
in-school factors make for effective learning, what works and what 
doesn't/' She reiterated this favorable attitude during the 1974 
N.E.A. convention in Chicago. According to that sameiVetv York 
Times report, Feldman explained that city teachers participated at 
the outset in the planning of the accountability program. She saidi 
''We thought we ought to get in and have a voice right at the begin* 
ning. Otherwise legislatures and school boards try to impose Nean- 
derthal plans on teachers. When we started the planning, most of 
the other groups were licking their chops saying, *now we ean get the 
tcache/s.' They couldnM understand why we had agreed to help de- 
velop an accountability plan/* Such supportive statements by a 
union official may be of special interest to those teachers who are op- 
posed to the very nature of mass testing and assessment and skepti- 
cal of the motives of thosic responsible for these inquiries and of their 
uses of results. * 



ERIC 



29 



ACCOUNTABILITY PROGRAMS OF TESTING AND ASSESSMENT 29 



Perhaps some or none of the trends mentioned in this brief review 
of recent developments in statewide programs entirely meet reserva* 
tions many English teachers have about the nature and uses of stan- 
dardized tests. The teachers may not feel, as apparently do members 
of the New York City U^F/r, that results of tests actually may be a 
protection for them or that increased resources may be allocated to 
schools and districts as a result of the identiFieationi through tests, 
of greater needs. It is hoped, nevertheless, that teachers ^-ill give 
some consideration to recent developments. 
The ETS 1973 assessment report draws upon some of Henry 
^ Dyer's statements in support of statewide assessment: 

Dyer (1966) reminds us how loudly the critics shouted in re- 
sponse to the plan of a National Assesf^ment of Educational Prog- 
ress. Some of the arguments raised against National Assessment 
were: (1) the tests would put undue pressure upon students; (2) 
the findings v^ould lead to unfair comparisons; (3) teachers would 
teach for the tests to the neglect of important educational objec- 
tives; (4) the program v^-ould ultimately force conformity and im- 
pose federal control of schools. 

Dyer reacts by stating that *\ . . one would suppose that to 
assess the educational enterprise by measuring the quality of its 
product is an egregious form of academic subversion" (1966^ p. 
69X Dyer sees the need of statewide testing programs for two rea- 
sons: continuity in the educational process and stability jn educa- 
tional systems. State-wide testittg can help to bring greater coti- 
tinuity into the educational process if it can bring to teachers a 
contitiiioiis flow of information about the developmental tieeds of 
students regardless of where they are or where they have been if 
tests areseen not so much as devices for selection orelassijication, 
but as instruments for providing contimwus feedback indispen* 
sitble to the teaching'leaming process. 

As readers examine the characteristics of standardized tests in En- 
glish and the recommendations and criteria for selecting and using 
them, alt might be considered again within the context of prospects 
suggested by what appear to be encouraging trends in statewide pro- 
grams in testing and assessment. 

Probletns in Statewide Progriuiis 

Encouraging though some trends in programs of statewide testing 
may be^ serious problems remain. 



30 



30 ALFRED H.CROMMON 



1. One difficulty is the possibility of construing stale programs as 
threats to the local control ot' schools. According to the ETS testing 
survey, policies are determined by state boards of education, other 
state education agencies, and the chief state school officer Almost 
all funds eome from state and federal sources. In most states, tests 
are selected by state education agencies and results are reported to 
state agencies. In some states, thesedata are absorbed into the plan- 
ning-programming-budgeting system and may become a factor in 
decisions affecting the allocation of funds to districts. For example, 
according to the Iowa report iAssessment Programs* p. 32), their 
Needs Assessment Program will indirectly ^'provide information for 
a pIanning*programming-budgefing system.'* The New York State 
report includes recommendations from the Flelschman Commission 
that made a two-year study of statewide "cost, quality and financing 
of education." One recommendation is that **school achievement ac- 
countability be coupled with fiscal accountabUity in standardized 
budgeting and auditing procedures; this system would be estab* 
lished by the Education Department The achievement and fiscal ac- 
countabUity ftould have as a basic link the statewide comprehensive 
information systems to provide facts for long-range planning evalua* 
tion and enforcement of state mandates/''* Such authority and pro- 
cedures cause uneasiness among school personnel and lay citizens 
concerned about protecting the principle of community control of 
schools. 

2. Another problem arises from the expressed purpose of using 
subject*matter test results to evaluate instruction^ programs, and 
educational planning — the goals most frequently reported by the 
states in the ETS assessment survey of 1973. Teachers are likely to 
object to any use of results of, say, a single standardized measure 
that includes aspects of English grammar as a means of judging the 
effectiveness of an individual English teachen of all English teachers 
in a school or district, or of a total English program. This misap- 
plied use of results of a test of a limited scope is for teachers prob- 
ably the most threatening single feature of testing programs. Con- 
sider, for instance^ the headline of the previously cited New York 
Times report of the N.E.A. convention; ''Accountability Plan An- 
gers Teachersp With Many Foreseeing Threat to Job.** The existence 
of this fear is recognized also in the ETS New York State report re- 
ferred to above. In an "Overview** of the Pupils Evaluation Program 
(Assessment Programs* p. S9\ the report states that "a major prob* 
lem has been the tendency among some groups* lacking technical 
background, to use the test results in isolation as a measure of the 



ERIC 



31 



ACCOUNTABILITY PROGRAMS OF TESTING AND ASSESSMENT 31 



quality of the educational program/* 

3. A third problem complicating some testing programs also is 
associated closely vtith teachers* uneasiness about being evaluated 
on the basis of tc^t results. As a consequence of this and other fac* 
tors to be mentioned later, many teachers have openly expressed 
negative attitudes toward programs ol standardi/^^d testing. For ex< 
ample, the report on the Hawaii Statewide Testing Program C4jJt^JJ- 
niait Programs, p. 29) reveals that "among the problems presently 
related to the program arc the developing negative attitudes towards 
all testing, the lack of understanding as to the usefulness of tests by 
teacher and student and the difftculty in making meaningful inter 
pretations of results at both the legislative and public levels.'* A 
number of other states reported problems posed by some teachers* 
negative attitudes, especially toward the u^e of norm-referenced and 
standardized tests and their apprehension about intended uses of re- 
sults. Some states using both norm-referenced and criterion-refer- 
enced tests report that while teachers manifest a negative attitude 
toward the use of norm-referenced tests, they apparently respond fa- 
vorably to uses oferitcrion*refereneed measurements. 

Teachers of English have been especially outspoken and resistant. 
For example, in one state that has to comply with a state law requir- 
ing testing in English, several prominent teachers of English and 
specialists in English education were very strongly opposed to state- 
wide testing in English; when the State Department of Education 
tried to develop its own tests rather than purchase ''as is'' tesls» these 
teachers rclused to cooperate with the Department and with other 
English teachers who were helping to develop guidelines for the 
state's own test. 

Such negative attitudes, coupled with the teachers' anxieties 
caused bj associating pupils* performances with the rating of teach- 
ers, can also subvert the purposes and the administration of tests. 
For exami)le, in the Oklahoma report (Assessment Programs, p. 69)» 
there is the comment about the "tendency for a few teachers to teach 
for the test/* According to The New York Times (April 9, 1971), the 
New York City Board of Education was concerned with this problem 
as i)lans were being made to administer the city*wide reading test 
that spring: *'As a result of several known instances of improper 
coaching for the tests by teachers and allegations of other improprie* 
ties, the Board of Education has asked Chancellor Seribner to eon- 
duct an investigation into the eitywide conduct of reading tests." 
The problems continue there. In reporting the city's plans to ad' 
minister the reading tests during the spring of 1974, The New York 



ERLC 



32 



32 ALFRED H.CROMMON 



Times (April 2) stated that ''special efforts have been made by cen- 
tral school officials to preserve the Integrity' of the tests in the face 
of recurring charges of cheating and coaching by some teachers who 
supposedly want to look good when the scores become public." Dn 
Polemeni, ihe acting director of the city system's Office of Educa- 
tional Evaluation, said, ''Unfortunately, there is a certain amount of 
understandable anxiety among principals and teachers. They feel 
they are operating in a fish*bowl environment and that their profes- 
sional futures could be affected if the pupils' scores arc not good/' 

These examples clearly illustrate the major differences between 
evaluation and accountability to the public; the erosive effects of 
some teachers' negative attitudes and misunderstandings; and the 
serious consequences of failures in communication among all groups 
involved. 

4. A fourth problem emerges from the relation between the con- 
tent of a specifle standardized test in English and the total English 
program jn a class, school, or district. A single measure may be a 
woefully inadequate means of yielding data to be used as the basis of 
judging the worth of contributions by a teacher or of an English pro- 
gram involving a wide range of cognitive and affective goals, con* 
tent, and a variety of relevant experiences. These difficulties become 
further complicated by the students' experiences in the array of elec- 
tive courses offered throughout the nation. If a student chooses a 
wide assortment of ebctives in literature, dramatics, and indepen- 
dent study and few, if any, in language and composition, then how 
well is that student likely to perform on a standardized test that is 
largely a measure of his command of specific aspects of English 
grammar, usage, punctuation, spelling, capitalization, and composi- 
tion? 

5. A problem related to the preceding is that caused by any sig- 
nificant discrepancy between a teacher's concept of the subject and a 
concept that underlies a standardized test. Suppose the teacher has 
a modem, informed, and somewhat flexible point of view toward the 
nature and acceptable uses of the English language — particularly 
toward the language a child acquires in the linguistic environment of 
home and community--and teaches accordingly. Then the student is 
expected to perform well on an external standardized test reflecting 
a traditional, restrictive concept of the English language and its 
usage, an instrument designed presumably to examine a person's 
knowledge and command of the language. Teachers are disturbed 
also by discrepancies between their efforts to stimulate students to 
read and read and read and to respond to literature as an expression 
of human experiences, and the kinds of external tests on literature 



ERLC 



33 



ACCOUNTABILITY PROGRAMS OFTESTINC AND ASSESSMENT 33 



that Students have to take. After providing them with enlarging ex* 
periences with literature, teachers often have to administer standar- 
dized literature tests that draw mainly, if not exclusively, upon stu- 
dents* memorized information about authors and literary selections* 

Regrettably, performances on these kinds of te&ts are intended to 
be readily mc£»sured and quantitled* However inadequately these re* 
suits may represent the total effects of a comprehensive English pro- 
gram upon students, these statistics have the virtue of convenience 
and accordingly are just the kind of information most likely to be 
forwarded to state educational agencies and to be publicized in the 
local community* Upon this kind of reported information, educa- 
tional authorities, parents, and other lay citizens judge the quality of 
their teachers and schools. 

6. Linguistic concepts represented by items of some standardized 
tests constitute a crucial problem: that of cultural bias contaminat- 
ing linguistic items in a test or indeed the whole test. Creating a 
''culturally free** test may be impossible; but creating a ''culturally 
fair'' one may not be* Such a test is particularly essential in any ap- 
praisal of a child's language* Pupils throughout our public schools 
represent, of course* the full range of the diversity of our culture. 
The languages and dialects children bring from home and com- 
munity vary accordingly* In 1974, the United States Supreme Court 
handed down a decision ordering the San Francisco public schools 
to ensure the teaching of English to almost IflOO non-English^ 
speaking Chinese children. But the success of efforts to help all chil^ 
dren gain increased facility in language depends, in large4)art, upon 
teachers* familiarity with languages and dialects, their acceptance of 
them, and their capacity to capitalize in their teaching upon the rich 
resources offered by varieties in language* Involved, too, is the -nat- 
ter of what appropriate adaptations will be made in adminbtering 
any program of testing in English. 

Two examples of counteractions taken by administrators in the 
New York City public schools reported in The New York Times 
(April 9, 1971) illustrate their convictions that standardized reading 
tests used by the city schools were biased against black and Puerto 
Rican children and illustrate also the political processes of educa- 
tional accountability. Rhody A* McCoy, a black, was administrator 
of the former Ocean HilbBrownsville demonstration district from 
1967-1970. During that time» he refused to permit schools under 
his Jurisdiction to use the reading tests distributed in the city-wide 
testing program. He considered the tests to be biased and unfair to 
black children* 



ERLC 



34 



34 ALFRED RCROMMON 



In 1971, Alfredo Malhcw, Jr., tlicn llic city\s only Piicrlo Rican 
district superintendent, refused to send to the seliool board liead- 
quarters the test papers ofS^OOO ehildreii in his district who took tho 
reading tests for prin",jr> level. He eharged that standardized tests 
"conipomid the tyranny of testing'* and are particularly unfair to 
pTipils dellcient in their use of the English language. Those tests 
prevent some youngsters "from lia\ing an opportunity to demon- 
strate their skills and scr\e as a sctf-hiltllting prophecy of failure/* 
He said further: 

Although we strongly believe in c(lncation<il accountability and 
maintenance of standards by the central Board of Education, we 
cannot accept those standardized tests wliieli may not be fair to 
oiirehildren. 

Therefore, I am asking for the cooperation of principals, staff 
and parents to carefully analyze the test items and test formats of 
particular levels of the Metropolitan Achievement Tests so that 
we can reconmiend constnictively what should be done with this 
year's tests. 

Standardized tests can be biased not oidy in content of items and 
format but also in the way results are u.scd to predict a pupifs aca- 
demic performance. 

Complexities in the treatment of bias in standardized tests are fur- 
ther illustrated by the reactions of Dr. Kenneth B. Clark, black psy- 
chologist, educator, and member of the Nc\\ York State Board of 
Regents, the state's higliest board in establishing policies in educa- 
tion. He objected to the city's contract with the Educational Testing 
Service; his objections arc somewhat the revevse^ however, of what 
might be expected in a case of bias* The test created during that 
three-year contract was administered in scleett.'d schools in the New 
York City public schools in 1974. As was mentioned earlier, this test 
was considered by Chancellor Irving Anker to be a pioneering at- 
tempt to separate school and nonschool factors afTeeting a pupil's 
learning and to compare his or her academic achievements with 
those of pupils in somew^hat similar circumstances. 

But, accoixling to The Ni^w York Times (March 19, 1971), Dr. 
Clark "assailed" the Board of Education because he disapproved of 
the concept of accountability held by Dr. Henry S. Dyer, then a vice- 
president of ETS, He opposed w hat he considered to be a de-empha- 
sis upon basic skills, overemphasizing ''such variables as the back- 
ground and environment of children." He advocated, instead* the 
Board's holding teachers and supervisors accountable for pupils' 



ERLC 



35 



ACCOUNTABILITY PROGRAMS OF TESTING AND ASSESSMENT 35 



academic achievetnent. In discounting the importance of taking into 
account a child's background. Dr. Qark said: ''There is no reliable 
evidence, for example, that the density of population, race, income 
of parents and so on, in and of themselves, prevent a child from 
learning to read. There is multiple evidence that children can be 
taught by effective teachers without regard to children's back- 
ground/' 

Complicated though theuholc process may be, the search for bias 
in standardiited tests and the effort to create culturally fair tests 
must continue. 

7. The use of standardized tests to meet the public's demands for 
accounting to the public poses another problem that seems to bees- 
pecialt> irksome and distasteful to many English teachers: having 
to reduce, in their terms, goals and teaching of English to the stm^ 
plistic level of quantifiable behavioral objectives. English teachers 
sccni to be in the forefront of those teachers who protest against be- 
ing expected or required to formulate behavioral objectives for their 
teaching and their pupils' learning, particularly into objectives that 
can be quantiflcd. Their alarms even increase when results of such 
measurements are integral to a statewide PPBS, and thereby may 
become tied to appropriations. 

All this notwithstatlding, many other teachers of English do not 
oppose using behavioral objectives. They apparently do not And 
them incompatible with their concepts of their roles as English 
teachers. Some analyses on the range of attitudes toward this contro* 
versial, complex matte: arc presented in the following NCTE publi* 
cations: On Wniing Behavioraf Objectws for EugHsft edited by 
John Maxwell and Anthony TovM; Accotttttttbility ttnd the Teach 
ing of EngUsh edited by Henry B. Maloney; and Systems. Systems 
Approitches, aud the re^icAer by James Hoetkerwith Robert Fichte- 
nan «nd Helen L* Farr. 

8. A llnal problem to be idcntilicd here is related to the practice 
of comparing standardized test results to so-called national 
norms'* and then of projecting comparisons into judgments about 
the quality of teaching. These norms actually arc formulated by pub- 
lishen* of the tests and arc based upon performances of pupils who 
took their tests in various grade levels in schools representing some 
kind of "national sampling.'* But a problem seems to arise from the 
public's assumption that the term "national" bestows an aura of of* 
Picial, tnlalliblc* universally accepted status upon these norms. As a 
consequence, when local results are compared to these norms and 
published, pupils, parent, te^ichers, school administraton, and lo* 



RLC 



3G 



36 ALFREDS CROMMON 



cal newspapers and other media tend to give special attention to the 
difference. Such inescapable factors as the local learning environ* 
ment and other implications seem to get neglected in the media and 
the public sector 

Although these norms are indeed based upon a sort of national 
sampling, the sampling might have been don&several years earlier, 
not with contemporaries of pupils currently taking the tests. More- 
over the sampling may not have included adequate representations 
of pupils in inner-city schools or remote communities, or pupils who 
are culturally different Furthermore, the tests may be based-upon 
testmakers' giving inadequate attention, if any, to the range of cog- 
nitive or learning styles of pupils. As Alfredo Mathew, Jr, said of the 
Puerto Rican children in his district of the New York City schools, 
the format of tests may be inappropriate to certain pupils and may 
deprive them of an opportunity to demonstrate their skills related to 
the subject matter the test was designed to appraise. So perhaps the 
reactions of pupils, rarents, and other lay citizens to the results of 
children's performa/ices on standardized tests might be more realis* 
tic if they considered those norms to be '^publishers' norms/' which 
they actually are. They are not "national" in the sense that they are 
suited to the diverse backgrounds of pupils represented nationally, 
nor in the sense that they have been anointed by some public agency, 
such as the U.S. Office of Education. The semantic as well as the ac- 
tual differences in such designations may be quite meaningful in io* 
cal communities. 

An Illustntton from Miehtgan 

Some problems arising from conflicting perceptions of the philos* 
ophy, design, and performances in statewide programs may be illus- 
trated by some examples from two publications about the accounta> 
biltty plan in one state. One document is a report written by a panel 
of three outside educators — Ernest R. House, Wendell Rivers, and 
Daniel L. Stufflebeam— who were undei contract to the National 
Education Association (NEA) and the Michigan Education Associa* 
tion (MEA) to evaluate the Michigan program for educational ac* 
countabilityJ^ The other is a response written by three staff mem- 
bers — C. Philip Kearney, David L Donovan, and Thomas H. Fisher 
— of the Michigan Department of Education (MDE)." 

Early in their response. House* Rivers, and Stufflebeam state that 
they believe accountability has important roles at all levels of educa* 
tion. They commend Michigan's six-step model of accountability, 



ERIC 



37 



ACCOUNTABILITY PROGRAMS OF TESTING AND ASSESSMENT 37 



and they report finding general approval of the Increased use of ob* 
jectives- referenced tests. 

But they question the w isdom of any state's mandating an almost 
"crash'' system of accountability that must be put into effect before 
specialists in educational research have evolved any relevant stan- 
dards and procedure^. Also questioned is the degree to which the 
state's broad educational goals and standards of minimum perfor- 
mance objectives on achievement tests represent the aspirations, cul- 
tures, and learning styles of the diverse population throughout the 
state, particularly those living in large cities. They are uneasy about 
the state's intentions of publishing for parents the state's educa- 
tional goats> lebt these goals and minimum performance standards 
unrealistically raise parents' expectations of what their schools are 
doing, or can do, to improve the level of their children's educational 
achievements. 

At the time that they made their inquiry throughout Michigan, 
they found little evidence that information gained through the sys- 
tem of accountability actually was being used by the governor, the 
legislature, and educational of^ ^ials as a basis for making educa- 
ticijal decisions. Nor did they find much evidence that local com- 
munities received signiFicant help in making their educational deci- 
sions. They found most educators interviewed were opposed to the 
policy of basing the allocated state funds, to some degree, upon test 
results. 

In the summary of the report they state; 

The Michigan accountability model itself has many good fea- 
tures. It has stimulated public discussion of the goals of education 
and provided direction for state accountability efforts. It has in- 
volved educators throughout the state in efforts to develop objec- 
tives and it has resulted in pilot forms of objectives- referenced 
tests that some teachers have found useful Overall^ the state's ac- 
countability work has created an aura of innovation and change. 

On behalf of the Michigan Department of Education^ Kearney, 
Donovan, and Fisher respond to the outside panel's report. They 
commend NEA and MEA for sponsoring the evaluation and report 
that the Michigan Department of Education welcomes the recom- 
mendations and the help the panel's report has given to focusing 
'attention and understanding of what is being attempted to improve 
the quality of public education. . . They consider each charge and 
recommendation made by the panel from the point of view that 
'through criticism comes grov^th^ and the departmental staff itself 



38 



38 ALFRED H.CROMMON 



must be accountable ifit is to encourage others to be accountable." 
Only a few of their responses will be included here to illustrate prob- 
lems resulting from differences in points of view and in interpreta- 
tion of evidence. 

Regarding the charge that undue haste characterized the plan- 
ning and launching of the Michigan plan without waiting for the 
benefits of standards yet to be established by educational investiga- 
tors, the spokesmen point out that in thi absence of any sueh re- 
search and emergent guidelines, the state had decided to *;ehallenge 
the unknown and develop knowledge where none existed." They 
then ask that if standards do not exist, upon what criteria did the in- 
vestigating panel base its judgments? But now masses of informa- 
tion about the statewide accountability program are being rapidly 
accumulated and studied, and, admittedly, problems have arisen in 
a project the scope of Michigan's. 

The two reports dilTef on many points. For instance, contrary to 
the judgments and interpretations of the panel, the Michigan De- 
partment of Education believes that the twenty-two statewide goals 
have to be general in nature and reports the following: 

(a) that objectives in the affective and psychomotor domains have 
been developed and are being incorporated in school pro- 
grams; 

(b) that objectives were developed with the help of hundreds of 
teachers, specialists in curricula, and administrators; 

(c) that each set of objectives was reviewed by a panel of educa- 
tors, other citizens, students, and the Council of Elementary 
and Secondary Education; 

(d) that the department makes a ''strong plea" that local com* 
munities and school districts develop their own special objec- 
tives and means of evaluation to supplement the state's mini- 
mvmi objectives — the state is assisting communities in these 
projects; 

(c) that the department has an on-going evaluation in ''nearly 

1,100 projects in over 500 school districts"; 
(1) that although the state believes the effectiveness of teachers 

and administrators should be evaluated, this process should 

not be carried on in a threatening manner, nor should data 

from results of tests '*be the sole criterion**; 
(g) that each year, about 30 workshops arc conducted throughout 

the state to help local educators; 



ERLC 



39 



ACCOUNTABILITY PROCEIAMSOFTESTINCANDASSESSMENT 39 



(h) that the department knows of many instances during 1970* 
1974 in which information from the accountability program 
has influenced educational decisions in the legislature, the ju- 
diciary, state agencies, and local school districts. 

As perceived by the MDE, the main philosophical issue between 
the basis of the state program and that of the panel's judgment is 
this: 

. . . whether there is a common core of objectives that transcend 
local district boundaries and which all schools should help stu- 
dents attain. The department's position is that these objectives do 
in fact exist, that they are identifiable through a rational process, 
and that the effort is worthwhile. 

The following subsidiary issue identified by the department is re- 
lated to a question raised by the panel: 

. . . whether minoritv ehildren should always be expected to 
achieve less and, thcreiore, be tested with a separate test The de- 
partment makes the assumption that there is no reason why most 
children cannot achieve certain minimal skills; therefore, it is ap- 
propriate to determine if such skills are being achieved and, if 
not, the reasons why. To design a minority group test would cer- 
tainly be possible, but the question is. Should it be done? The 
staffsay no! 

The fundamental purpose of the program, especially of the perfor- 
mance-based compensatory education program is *'to demonstrate 
that Michigan's children, regardless of race, family circumstances, 
or geographical location, can acquire baste school skills for adult 
survival." 

This exchange between the NEA-MEA panel and the Michigan 
Department of Education is instructive indeed. The department 
commends the panel for helping increase the attention given to the 
state's accountability program. Concerned educators and other citi- 
zens elsewhere should also thank the authors of the two reports for 
focusing what otherwise may be diffuse attention upon some centr^l^ 
issues, upon evidence and progress, and upon the need for holding 
accountability systems accountable. The debate illustrates comple](^\ 
ities inevitably resulting from extensive programs venturing into new 
educational territories. Undoubtedly, other states will also be eval* 
uating their programs, English teachers should get involved in the 
process or at least keep well informed on sueh developments. 



o 40 

ERIC 



40 ALFRED H.CROMMON 



Notes 

1. Center for Statewide Educational Assessment and the ERIC 
Clearinghouse on Tests, Measurement and Evaluation at Eduea- 
tional Testing Serviee in eoUaborasion ^ith Education Commission 
of the Stales, Suite Eiiuvational Assessment Programs, 1973 /iew- 
jf/o/i(Prineeton.NJ.: Edueational Testing Serviee, 1973)>p. K 

2. Edueational Resourees Information Center Clearinghouse on 
Tests, Measurement and Evaluation and Offiee of Field Surveys at 
Edueational Testing Serviee in eollaboralion with Conferenee of Di- 
reetors of Stale Testing Programs. 5*^«fc Testing Programs, 1973 P^ 
ra/o« (Priueeton, NJ.: Educational Testing Serviee^ 1973), p. 1. 

3. Henry M. Levin. "A Coneeptual Framework for Aeeountability 
in Edueation " School PevtQw 82, no. 3 (May 1974): 363-391. Levin 
presents an informative analysis of eaeh of these eoneepts and of 
some implieations of sehools' being aeeotintable to the entire range 
of their constituencies for ^'proximate'' and "ultimate" edueational 
goals. 

4. Frederiek MeDonald, "Aeeountability Design Stresses Positive 
Aspeets," ETS Developments 20, no. 3 (Summer 1973)> presents 
more information about this point of view and about a model of an 
aeeountability system proposed for the New York City publie 
sehools, 

5. Henry S. Dyer. "The State Assessment Survey." Paper delivered 
to the Assoeiation of Ameriean Publishers, Washington, D.C.» April 
29. 1971 (mimeographed). 

6. Assessment Programs. fKfjf), 

7. Garlie A. Forehand. ''Evaluation, Deeision-Making, and Ae- 
eountability/' in Acconntability and the Teaching of English, ed. 
Henry B. Maloney (Urbana, 111.: National Couneil of Teaehers of 
English, 1972), pp. 23*33. Emphasis added. 

8. Assessment Programs, p. 7. 
9- Testing Programs t p. 7. 

to. Assessment Progntms. p. 7. 

11. R. E. Stake, "Sehool Aeeountability Laws/' The Journal of 
Educational Evdlnation 4, no. 1 (February 1973): 1-3, as quoted in 
Assessment Progrants, p. 7. Emphasis added. 

12. Assessment Programs i pp. 10*11. 



ERIC 



41 



ACCOUNTABIUTV PROGRAMS OFTESTING AND ASSESSMENT 41 



13. Testing Programs^ p. 
J 4. Testing Programs, p. 7* 

15. Henry S. Dyer, "The Functions of Testing—Old and New/* in 
Testing Responsibilities and Opportunities of State Education 
Agencies (Albany, N.Y/, New York State Education Etepartment, 
1966), pp. 63-79, as quoted in Assessment Programs, p. 7. Emphasis 
added. 

16. Assessment Progmms, p. 58- 

17. Ernest R. House, Wendell Rivers, and Daniel L. Stufflebeam, 
"An Assessment of the Michigan Accountability System/' Phi Delta 
Kitppan 55, no. 10 (June 1974): 663*669. The full report may bcob- 
tained from the National Education Association or the Michigan 
Education Association. 

18. C Philip Kearney, David L. Donovan, and Thomas H. Fisher, 
"In Defense of Michigan's Accountability Program," Phi Delta 
KappQit 56. ao. 1 (September 1974); 14-J9. 'This article is drawn 
from a 35-page booklet titled/! Stajf Response to ike Report: An 
Assessment of the Michigan A ccountability System. " 




42 



X 

I^utgua^ Development and Its Evaluation 

V/ alter Loban 



Power over language is not some sudden bunt, like a Fourth of July 
skyrocket; rather it is like a plant growing and interacting with its 
environment. Teachen have an interest in knov^ing whether or not 
their pupils arc advancing at a reasonable rate in command of Ian* 
guage, and whether or not wayai to demonstrate that growth can be 
determined through the use of published tests. 

Such an interest requires that the main features of effective lan- 
guage behavior first be identified and then evaluated. These main 
features are concerned with power in oral language even more than 
with reading and writing, for the living language is the spoken Ian* 
guage. Yet, no published test attempts to appraise the spoken word, 
although a few do seek to evaluate listening. Thus^ at the very outset, 
one needs to be cautious about tests for language growth. 

rr the central concern, spoken language, is missing, what remains 
to interest schools in these published tests? The question is of enor- 
mous importance^ for tests do influence the curriculum^ and if they 
deal with peripheral rather than central language concerns, the tax* 
payer's money is wasted upon misdirected teaching time and upon 
materials purchased to further minor objeetives. 

To reduce power over language to such mechanics of written lan- 
guage as spelling, punctuation, and capitalization is a dangerous 
oversimplification. These are not the true fundamentals of language. 
The people of this nation need instruction that focuses not only upon 
details but also upon latter adaptations, such as vigor of thought 
and precise expression of thought and feeling. A perspective that be* 
gins with errors of mechanics rather than with a more complete pic- 
ture of desirable accomplishment seldom reaches to the really im* 
portant aspects of language ability — interest, pleasure in doing or 
using, organization, purpose, and other crucial integrating and 
dynamic patterns of performance. To be sure, punctuation has its 

45 



43 



46 WALTER LOBAN 



limited importance but it is not as important as having something to 
say or purposeful organization* 

With this in mind* the language development tests were examined 
for the help they might offer in assessing such matters as the use of 
language (a) to put order into experience and (b) for clarifying 
thought* feeling, and volition by making distinctions* modifying 
ideas, and controlling unity through arrangement and emphasis. 
The possibility that paper and pencil tests can show much about 
such aspects of language is noi very great, but can they make any 
contribution? 

Searching among published tests for any which might help to 
chart the development of language ability is a disheartening task. 
Condemnation is easy but scarcely a positive action. We need to ask 
what should be evaluated in language development and how it 
shQuld be evaluated. In the interest of better language instruction — 
for what Is taught in the schools shrinks inevitably to what is tested 
or evaluated — let us try to answer these questions as best we can* 

At the heart of the matter, in this reviewer^s opinion, is the rela- 
tion between the schools and the society that shapes them* Is it not 
possible that in a democracy, state educational agencies could assist 
schools by identifying and clarifying the goals of instruction in lan- 
guage, sifting what is significant and crucial from what is contribute 
ing and subordinate? This assistance would bring into focus the re- 
lation language bears to personal and mental development* Instead 
of administering statewide tests of extremely limited coverage* the 
state could urge or require its schools to evaluate the significant 
goals that have been identified and suggest feasible methods of do- 
ing so. In monitoring evaluation, the state might encounter districts 
in which the evaluation was, for one reason or another, inadequate. 
In such cases, the state could suggest improvements or the model of 
some other school district. 

Emphasis on oral language development is essential to any re* 
formed.curriculum. An important reason for its present neglect is 
the complete absence of oral language in all language testing, 
whether it be college entrance examinations or elementary school 
testing. Yet, oral language, by its very nature, cannot be reduced to 
paper and pencil tests* nor do we know of any variables, amenable to 
paper and pencil testing, which correlate with oral language power 
Even so, if precision cannot be achieved, evaluation is crucial. With- 
out it, the curriculum will continue to neglect this basis for reading, 
writing, and appreciation of literature and they, in turn, will suffer* 



44 



LANGUAGE DEVELOPMENT AND ITS EVALUATION 47 



'Vith the development of tape recorders, videotape, and cassettes, 
the drawbacks to evaluating oral language have diminished. The ob- 
jections that taping requires too much time and money can be solved 
easily by using sampling procedures. It is not necessary to record 
every pupil. In a class of thirty, a random sample of six pupils can 
demonstrate growth tf the sampling occurs in September, February^ 
and May (individual pupils with special or severe problems can be 
recorded and studied more intensively). Class or group discussions 
can be recorded in similar situations at the opening and closing of a 
semester or a year^ and from one school year to the next. 

Rating scales can be used to identify and check the kinds of items 
already discussed. Reliability and validity can be increased by add- 
ing to the number of raters and by having the ratings carried out by 
persons, other than the teacher, who do not know the pupils. The ex* 
pcnse of employing such raters would be much less than the money 
now e.'fpended upon published standardized tests. 

The development of language power is infinitely morecomprehen* 
sive and complex than any available standardized test indicates. For 
this reason, this reviewer cannot in professional good conscience 
recommend tun available standardized test as a valid measure of 
children's linguistic ability. 

Insofar as valid tests can be constructed at all, test creators will 
need to consider the goals described below in any serious evaluation 
of children's ability to gain power in language. Inasmuch as the Hv^ 
ing language is the spoken language, most of the abilities listed be^ 
low should, if firmly developed, contribute also to power over the 
written language as well as to reading ^nd listing. 

Usage that does not distract the attention of the listener. Dialects 
and various levels of formality may be appropriate, depending upon 
the situation. 

Clearenunciation and articulation. 

An ability to cleave to the point without too much qualification, 
modification, and random associations. This is true for conversa- 
tions* group discussions, and individual speeches before the class. 

Clarity or organization, ability to develop one idea at a time^ preci- 
sion in the use of language, wealth of vocabulary — these assume new 
importance when the effect upon others is the basic criterion of suc- 
cessful expression. 



ERIC 



45 



48 WALTER LOBAN 



Ability to get to the point, cleaving to the heart of the matter in dis- 
eussion^ panels^ and group work. 

Improvement in abititj to stick to the point. Do pupils make prog- 
ress itt discussing a point? Arc their illustrations pertinent to the 
eohcept under diseussion? 

Are pupils aware of the levels of formality in language? Do they 
kno^\ ^\hen to use slang, colloquial, informal* and formal usage? 
Do they adapt their language to the occasion? 

Are they able to use a standard language appropriate to the situa- 
tion: not offensive, not distracting from the idea. (Examples of non- 
standard usage: I seen it; Her and me had a quarrel; I ain't got none 
of them there new shirts.) Are they learning why they should avoid 
profanity, stale slang* delicate subjects, libel, and smear words? 

Are they better able in conversation and group discussion to 

a. makedistinetions? 

b. modify ideas? 

c. control unity and coherence through transitions and arrange- 
ment? 

d. do they use emphasis to full effectiveness? 

A moderation in speed of speech — neither too ilovv nor too rapid; 
variation in rate and volume, appropriate to content. 

Vitality — involvement; energy of speech. 

Effective clustering of words and phrases — and pauses (clustering 
that contributes to the meaning). 

A resonant voiee, varied in pitch; characterized by an imparting 
tone; emphasis (stressing of words) helpful to the listener. 

Are the participants on panels and those who give oral reports ef- 
fectively adapting their manner and presentation to their audienee? 

Are they using an impartwg tone and appropriate gestures? 

Are primary school children learning to adjust their voiees (volume, 
intonation) to their hearers? (Also their behavior, sueh as looking 
at hearers?) 

Are pupils learning to adapt their rate of speech to the situation and 
the ability of their listeners to follow their ideas? 

Drama is frequently used to foster growth In oral language. Through 
tt 



er|c 



48 



LANGUAGE DEVELOPMENTAND ITS EVALUATION 49 



a. pupils extend the range, fluency, and effeetiveness of their 
speech, 

b. words move from a passive reeognition voeabularj into aetive 
use, 

c. words are increasingly used with meaningful intonation. 

Awareness gf the patterns gf sound linked to thought will manifest 
itself in 

a. sensitivity to standard Usage, 

concern that others will receive one's communication without 
distraction^ 

c a distaste for sloppiness and, therefore, distaste for whatever 
violates grammatical concord. 

The pupil must become aware of important rhetorical goals: 

a. the strategies ofemphasis^ 

the skills of exemplifying and generalizing, 

a the impgrlancc of unity and relevancy, gained through impos- 
ing order and structure that arc dynamic^ not mechanical. 

A sincerity that enables the words to flow more easily. 
Poise from inner security and confidence. 

A stable personality, free from timidity* self-depreciation^ conten- 
tiousness, egoccntrism, and all such traits that reveal themselves in 
speech or manner 

Developing respc'^t for diversity of opinion. Are they learning to wel- 
come (lifl'crcnces of opinion because such difl'erenees motivate 
thought? Arc they learning to difl'er without rancor? 

Are class discussions showing improvement in courtesy* mutual re- 
spect, and thoughtful attention to the feelings and dignity of every- 
one in the class? 

Are pupils gaining the skill to retreat gracefully from an untenable 
position and to modify their ideas in the light of new evidence? 

Do students feel a duty and djiigation to express their point of view, 
.cvcn when it is unpopular, so that through democratic process, the 
group has access to all sides of an issue? 

Do students speak without self-consciousness to students of other 
racial, ethnic* cultural groups? Other socio-economic levels? 

Are the very talkative pupils discovering terminal facilities for 
speech? Arc the quiet and laconic pupils using more and better lan- 
guage? 




47 



50 WALTER LOBAN 



Are all the pupils learning (o use language to put others at ease, 
ehecr them up, or dra\rout their ideas? 

Are thc> learning to discard rigid dogmatic statements and replaee 
them with a "positive tentativeiiess"? 

Arc the students gaining in personal poise and self-relianee? 

Are students showing anj signs of asking for authoritj and oF judg- 
ing the courses from v\hich tlit:> get their information? Do students 
recogni/,e when suitenients are backed by opinion rather than faet, 
and do tlie> feel an obligation to cite sources and faets when these 
are pertinent? 

Informal business meeting procedures and parliamentary proee- 
diires enabling pupils to 

a. get things done by handling one thing at a time? 

b. give the minority a hearing? 

e, see that the majority opinion prevails? 

Are tliey learning to detect basic assumptions? The dilTerence be- 
tween significant and insignifieant knowledge? 

Arc thej learning techniques of group discussion and cttKHent use of 
time? 

Is there wide participation in class.discnssions? 

Do tliej show a cuiicerii fur truth in language; do they know why per- 
jury is buch a serious crime and so carefully watched in courts? 

In addition to the above goals, test creators should consider the 
findings of research. Research shows that eertain language behav- 
iors charaeterize growth and command. The pupils use fewer short 
oral utterances, express tentativeness more frequently through state- 
ment of supposition, condition, or Concession; use more analogies 
and generaIi/,ations, and excel in cohcrenee because they use effec- 
tive subordination of all kinds — nonfinite verb phrases, preposi- 
tional phrases, ab:voIute constructions and appositiveSt as well as 
adjective clusters and dependent clauses. 

Not grammatical sentence pattern but what is done to achieve 
greater Hcxibility and modification of ideas within these patterns 
proves to be the real measure of proficiency w ith language. Growth 
in this attribute is important. 

Expression can be improved. Dexterity \\ith oral language can be 
ad\anced bj idcntifjing the elements of language which strengthen 
or weaken eommunication, tliat inerease or lower preeision of 



ERIC 



48 



UNGUAGE DEVELOPMENT AND ITS EVALUATION 5l 



thought, that clarity or blur meaning. Many of these elements would 
be such matters as liveliness and energy of speech; sticking to the 
point; reducing the mazes or language tangles that so often result 
from too much qualifying, timidity* insecurity, or failure to realize 
how the listener reacts to so much hesitation. 

Increased attention need:> to be focused on oral language* not Just 
talk and chatter, but rather on what might be called thinking on 
one*s feet, i.e.» learning to organize or pyramid ideas; to cleave to the 
heart of a topic; to make progress with ideas; to generalize when 
enough illustrations have been given; and to illustrate when gen- 
eralizations are complex or new to listeners. 

Teachers can help pupils to compare, contract, categorize, and 
impose structure on loose material. Teachers can also help pupils to 
use analogy; become more proficient in synthesis by showing how 
^ things go together; use induction from particulars to generaliza- 
ys^tionsi use analysis showing how to take ideas apart; use deduction 
from concepts to particulars. 

Pupils and teachers need to be concerned with the good organiza- 
tion of ideas and good coherent thinking; having something to say 
and organizing it in terms of a purpose; the ability to grapple with 
e.xpressing one's own ideas or receiving ideas one wants to hear; 
finding appropriate words to clarify and organize thinking about ex- 
periences, feelings, and thoughts; a greater facility with language 
that emerges because one is forced to use language in widely varying 
situations. 

All of these goals, other than those directly concerned with acquir- 
ing and using standard English, are relevant to speakers of social- 
class dialects also* 



ERIC 



49 



Elementary School Language Tests 

Wiiiiiun A Jenkins 



This introduction is an inductive statement growing out of speeifie 
rcaetions to the tests reviev^cd and presents eonelusions whieh 
should have relevanee for all who wish to ehoose language tests for 
use itt the elementary sehooL 

It seems very elear that the authors and publishers of language 
tests do not recognize the limitation of paper and peneil tests in mea- 
suring language arts abilities. Some abilities, reeognized as impor- 
tant by every teacher> simply do not lend themselves to paper and 
peneil tests. These abilities are often thought of as residing in the 
affective domain> or are called higher-level abstraetions or higher- 
level learnings- However they are defined, they are ignored by the 
authors and publishers presenting tests whieh purport to measure 
all that is important about the language arts and in language arts 
abilities and skills. Any analysis* even one as perfunetory as some of 
those made here, shows quite eertainly that the tests do not do this. 

It is also elear to this reviewer that the manuals whieh aeeompany 
the tests are uvercharted and overgraphed. This reviev^er eoneluded 
that the technical analyses and statisties do add to the mystique of 
test making and test-writing, but their edueational value must be 
questioned. Aside from providing information for statistieal experts, 
other test writers, and school administrators who need statistieal evi- 
dence to back up their claims about the quality of edueation in their 
schools* they are of little use. Aetually, they tend to elutter up the 
teaeher's work. 

This reviewer also felt that test writers oversimplify language in 
their desire and attempts to reduee it to the relatively meehanieal 
operation of taking a standardized test. Moreover, beeause of their 
inability to measure the more complex language elements, this over- 
simplifieation is a distortion. 



52 



ERIC 



50 



ELEMENTARY SCHOOL LANCUACETESTS 53 



A broader context is needed to measure language arts abilities 
than it is possible to give in most of these tests. Situational analyses, 
where the skills actually function, appear to be what is required. 
Such situations could get at the pupiKs ability to organize his or her 
thoughts, to relate ideas to each other, to distinguish bet\^'een ideas, 
to create word thoughts and pictures, and to recognize the difference 
between using oral and written language. 

Without a doubt, standard English is the primary* if not the only 
dialect recognized by the test writers represented in the tests re- 
viewed here- These writers recognized few> if any, dialectal dif- 
ferences and deviations. 

All of the tests reviewed are strong in measuring selected items of 
achievement, but are just as weak as diagnostic instruments. Lest I 
be misunderstood, with one or two exceptions they were labelled 
achievement tests, but in this reviewer's mind measuring achieve- 
ment is far less useful and perhaps less valid than analyzing chil* 
drenS a'^eas of weakness in some depth so that they may be taught 
better. The test writers represented here apparently would like to 
palm off the notion that recognizing an error in spelling, capitaliza- 
tion, or punctuation in an exercise is the same thing as having the 
ability to use language correctly. This is not the case, and most alert 
educators know it. 

People such as Paul Diederich of the Educational Testing Service 
have discovered some unusual things about tests. Diederich points 
out, for example, that, in measuring growth in achievement, stu- 
dents with the lowest initial scores gain most on post-tests, while 
those in the middle gain less; those with the highest initial test scores 
gain little or even regress vbhen taking the post-tesL Whether tests 
measure appreciation, altitude, or insight into human relations, or 
whether they measure knowledge of any sort, these results are the 
same. 

It has also been found that when a teacher gives a published test 
that measures almost any skill that develops more or less continue 
ously — such as learning to read, write, or do arithmetic — at the be- 
ginning of the school year and a parallel form of the same test at the 
end of the year, the average score is practically certain to rise. 
Diederich says that the results are influenced by what is known as 
the ceiling effect, by regression, and by unequal units of measure^ 
ment, that is, the difficulty in making again from 80 to 85 percent is 
many times harder than from 30 to 60 percent.' 



ERIC 



51 



54 WILLIAM A JENKINS 



These findings raise some interesting questions. For example, if 
School administrators know that the poorest segrtng ehildren will do 
better on a post-test than on a pre-test* why should they bother to 
give the pre-test? With iio instruction, the post-tests will give high- 
er results than the pre*tests* By the same token, perhaps the school 
administrator should not allow the brightest children to take a post- 
test because they may regress and show the school in a bad light! 
Students in the middle group probably should be treated as those in 
the middle are usually treated — that is, be ignored — because the test 
results will show that the children have neither progressed nor re* 
gressedand thus will neither help nor hurt the school 

It is frequently claimed that one of the reasons for testing students 
is that taking a test is an experience which reinforces learning* But 
aeeording to Balch,^ the opportunity to learn frequently is lost, 
learn g prinetples are violated, and the potential learning is de- 
stroyed. Batch says that learning is always affected both by external 
elements in the situation, including the amount, organization, com- 
plexity, and meaningfulness of the material to be learned, and by in- 
ternal factors which arc characteristic of the learner. One wonders 
again how a testing situation eati be standardized, although test 
after test claims that it can be^ when we have just pointed to five vari- 
ables that must be taken into account and which certainly cannot be 
standardized from one testing situation to another. The results abso- 
lutely must vary, according to Balch. 

All of the above limitations and characteristics of tests perhaps 
are overshadowed by a condition which appears to make most of the 
tests reviewed in this section inappropriate for 90 percent of the chil- 
dren in our society: the tests are all written in standard English, ig- 
noring practically every other major dialect. Let us look at this phe- 
nomenon. The student most disadvantaged by these tests is the 
black child. To the majority of black children, being black is almost 
synonymous with being poor. It is this poverty which prevents many 
of them from having the kinds of experiences that support the in- 
structional programs of the schools. They are too poor to take trips 
to cultural facilities. They are too poor to have books and education^ 
al toys in their homes. They arc too poor to enjoy all of the objects 
and services commonly a part of the experiential background of the 
middle class Caucasian child. They cannot satisfy their educational 
needs outside of the school. All of these cost money and most black 
families c^'^^not afford them. 

Dr. Kch ^h Johnson, a black educator and linguist^ points out 
that to the bi k child membership in that minority group increases 



ELEMENTARYSCHOOLLANCUACETESTS 55 



the chances of being culturally disadvantaged. 

Culture can be defined as a way of life» a design for living, that 
consists ot'the attitudes* beliefs* practices, patterns of behavior, 
and institutions that a group has developed in response to partic^ 
cular conditions in order to survive. In this country the conditions 
that existed for the majority of the people have produced the re^ 
sponse labelled, '*the dominant culture/' Black people, however, 
have had to respond to a different set of conditions, and they have 
developed a sub-culture that is different in many ways from the 
dominant culture. . * *^ 
Dr. Johnson goes on to say: 

Menibership in the black sub-culture contributes to cultural 
deprivation because it prevents black children from.acquiring the 
middle class cultural patterns by which almost all school curnc* 
uta and instructional materials are based. Many black children 
have not acquired from their sub*culture the language? patterns* 
the value sjstcm, the attitudes and beliefs — the entire experiential 
background — that the school program demands * 

Probably the most pervasive effect of being a minority group 
member* coming from a minority subculture, and being raised amid 
poverty is that the individual child develops a value system which 
often is radically different from that of the majority. For example, 
learning and school have negative valences rather than the tradition* 
al positive ones for minority children. This is explained, in part, by 
the fact that the child's environment is the negative one of large^city 
ghettoes. Such an environment restricts the experiences of these chil- 
dren, and the concepts their experiences yield are not those on which 
the school program is based. To go further* it can be pointed out 
that children living in a nois> ghetto under crowded conditions and 
surrounded by much activity are bombarded with stimuli. But they 
tcarn to shut out these stimtili in order to have peace of mind. This 
habit, it is said, becomes a hindrance to them in school because they 
then shut out the instructional stimuli provided by teachers. Evi* 
dencc fot this claim is the fact that the majority of ghetto children do 
not have the ability to distinguish meaningful sounds equal to that of 
the typical suburban ^ id. Another example of a different sort is 
that ghetto children tend to be aggressive. Because they value 
aggressiveness over intellectualism, working in groups, one of the 
chief instructional modes of the school, is antithetical to what they 
have learned outside of school. Thus ghetto children, and here we 




56 WILLIAM A JENKINS 



are speaking primarily about black children, see few benefits com- 
ing from intellectualism as they encounter it in their environment 
and in many cases are antagonistic toward it. 

The disadvantages of black children definitely extend to their lan- 
guage. Because they speak a nonstandard dialect, a number of edu- 
cators and linguists believe their language interferes with their at- 
tempts to read and to speak standard English. They are frequently 
viewed as being deficient, while in actuality they may be merely dif- 
ferent. In their use of lahgrrage, they are as creative, as intrepid, as 
eflective in communicating as anyone using his or her dialect. 

These ideas can be reinforced by pointing to the problems of other 
minority groups. For example, English proficiency tests which have 
been prepared for native speakers of Spanish will not be entirely 
appropriate for native speakers of any other language* As long as 
one constructs tests that consist mainly of vocabulary, this statement 
is not a valid one. But for tests which involve more than vocabulary, 
the statement demands attention when one considers the importance 
of slight (iinbrenccs in the syntax and phonology of languages re- 
vealed by eontrastive language analysis* ^■ 

A different problem might be faced by the Appalachian white 
child or the ghetto disadvantaged white child, in contrast to the 
black child ^ho may suffer because he or she is physically conspicu- 
ous. When this condition exists, divergences from the teacher's dia- 
lect are likely to be ascribed to innate ignorance. This contrasts with 
the black child's divergences, which arc frequently ascribed to race. 
As Raven McDavid points out. both ascriptions are equally fallacious. 

The disadvantaged white uses many non-standard grammatical 
constructions. This is almost tautological, since it is the advan- 
taged who in the long run determine what the grammatical stan- 
dard is and should be. Where the discrepancy between educated 
and uneducated speech is greatest, as in the south, the incidence 
of such non standard forms will be the highest. The only caution 
is that there are wide variations in the extent to which the various 
subcultures tolerate deviations from the norms of formal exposi- 
tory prose.* 

In summing up all of these differences for the black, for the disad- 
vantaged white, and for the Spanish-speaking child, one must un^ 
derscore that the teacher avoid forcing an external standard upon 
students. This reminder is an elementary principle of learning and 
of language teaching. It is well known that the teacher should simply 
adapt techniques to the structure of the students* dialect and let the 
overwhelming power of the culture do its work. However, when one 



er|c 



54 



ELEMENTARY SCHOOL LANGUAGETHSTS 57 



ii using a stc>ndatxiized tebt indelibly written in the language of the 
major cloture, thai is in standard English good teaching becomes 
impossible an.i priiiciples of learning are superseded by fairly rigid 
principles of teU giving. 

It is a gross oversimplification lo sa> that most of the ills in pub- 
lished tests or in textbooks arise out of the i^rofit-makiag motives of 
the publishers and their Jesire to be publishi^rs for all of the people. 
But test publishers do have ^o play down racial, regional, and dialect 
differences, and have to ignore certain racial and dhnic minorities 
if their tests arete sel! succf^ssfuh^^ Just as it is unprofitable for pub- 
lishers to provide a wide variety of textbooks Tor the same grade be- 
cause of the widely differii^g background of the pupils who will be 
using them, so it Liaise uneconomical for publishers to pnt out a ser* 
ies of tests eovering the same grades ana ability levels but keyed to 
the widely divergent cultural, social, economic, and psychological 
backgrounds of the childrun to be test-^d. In the past they mainly 
have had to ignore these ditleCences and dtsign materials for a gen- 
eral national market. The results may have 'jeen economically prof- 
itable, but educationally they verg^ on bankruptcy. The tests, as will 
be illustrated in thv individual reviews, simply do not do what they 
purport to do, nor what this reviewer thinks they ought to do. 

Test pubfishers, as well as textbook publishers, should base more 
of what they do upon what goes on In what area, in effect, the several 
different school systems found in a city: the ghetto schools, the in* 
ncr city schools, and the suburban schools. Standardized tests are 
direeted at inner-city and suburban schools, while ghetto schools are 
usually ignored. Publishers who would consider putting together 
tests for the latter might examine James Herndon's The Way It 
Spozed To Be, Herbert Kohl and Victor Cruz's Su^\ Sylvia Ashton- 
Warner's Teacher^ and materials coming out of the Watts Writers 
Workshop as a first step. 

It is clear to this reviewer that the chief deficiency of the tests re- 
viewed is that they tend to ignore the several cultures which make up 
, our society. This deficiency is serious. If education is anything, it is 
the understanding of and induction into one's own culture, whatever 
it may be. Unfortunately, these tests evidently measure understand* 
ing of only one culture. 

Notes 

1. Paul B. Diederich, "Pitfalls in ,*je Measurement of Gains in 
Achievement/' in Classroom Psychology: Reading in Educational 



55 



58 WILLIAM A. JENKINS 



Psychology, 3t<i cel., cds. William C. Morse and G. Max Wingo 
(Glcnvicw, 111.; Scott, Forcsman and Company, 1 971), p. 338. 

2. John Balch, "The Influence of the Evaluating Instrument on Stu- 
dent Learning/' in Classroom Psychology, p. 347. 

3. Kenneth Johnson, *'Blaeks," in Reading for the Dismlnm* 
(aged Problems of Lhigtiistically Different Learners, ed. 
Thomas D. Horn (New York; Harcourt, Brace Jovanovich, I970X p. 
30. 

4. Johnson, "BlacLs/* p. 30. 

5. Raven h McDavid, *'Native Whites/* in Rending for the Dis- 
ndviintnged, p. 136. 



ERIC 



ELEMENTAKYSCHOOLLANGUAGETESTS 59 



New Iowa Spelling Scale. Harrj A. Greene. Iowa Cily: Bureau of 
Eduealional Researehand Serviee, 1954. 

Iowa Spelling Seales. Ernest J. Astibaugti. Io^\a City. Bureau of 
Edueational Researeti and Serviec, n.d. 

Buckingham Extension of the Ayres Spelling Scale. B. R. Bucking- 
ham. Indianapolis: Bobbs-Merrill Co., n.d. 



The Iowa Sitelling Scales (grades 2-8) are lists of words that have 
been found to be used widely in the written communication of ehil- 
dren and adults. Thediffieulty level in a partieular grade is given for 
each word (a total of 5,507) by the percent of aceuracy ofspelling for 
that word in each grade. For example, the word dandy is spelled eor- 
reetly in grades 2-8 thusly: 8, 32. 48, 68. 82, 85. and 88. The seajes, 
then, are not truly tc&ts but rather are a souree of material for teaeh- 
er-niude or .standardized tests. The seales are still usefuU although 
one has to raise the question whether a seale developed in 1954 is 
applicable and vtdid for children in 1976 when one realizes that the 
eliief increa.se in vocabulary conies in the form of nouns. The things 
children are interested in have changed considerablj in the interven- 
ing twenty years. 



Hoyum-Sanders English Test* Elementary Test I, form B; Test jl. 
forms A and B; Intermediate Test 1, forms A and B; and Test il, 
forms A and B, Vera D. Hoyum and M. W. Sanders. Emporia, 
Kans.: Bureau of Edueational Measurements, 1964. 



The Elementary Test I. form B, ineludes ten sentence reeognition 
questions, fifteen capitalization questions, fifteen punetuation ques^ 
tions, ten contraction, possessive and spelling questions, thirty-five 
usage <|uesttons, and ten alphabetization questions. Test II, form A, 
and Test II. form B, follow^ the same pattern. 

The Intermediate tests a^^k ten questions on sentenee reeognition; 
t\vent> questions on capitalization; twenty questions on punetua- 
tion, ten <]nestions on eontraetions, possessives and plurals; fifty 
<{ucstions on usage, and ten questions on alphabetization. Test I, 



57 



60 WIILIAMA.JENKINS' 



form B, and Test IL forms A ami B, follow the same format* al- 
though the number of sentenees varies ''*"ghtly. 

All questions except the alphabetuation* vthich eonsists of single 
vtords> are posed in the form of sentenees or sentence fragments. A 
eontext of sorts, therefore, is provided. The test ean be hand seored 
by the teacher. The case of scoring, plus the ease of administering, 
have to be considered strengths of the test. Directions, including 
norms and percentiles* are included in a six-page leatlet. 

The authors claim that the purpose of these sets of tests is to mea- 
sure objecti\cl) pupil and class profieicncj in the essential rrjechan- 
ics of English. The authors also say that the tests may be used for 
both sur^ej and diagnostic purposes* *'The> help the teacher deter- 
mine pupil and class deficiencies and therefore will lead to better 
teaching." The tests, according to the authors, ma> be used in a 
number of ways: (1) for determining pupil achievements; (2) for 
checking the crficienc) of instruction; (3) for assigning school 
marks; (4) for analj/ing pupil and class i*eaknesses, and (5) for mo- 
tivating pupil effort. 

Compared to other tests* however* these are slim enough to make 
one question their ^aliditj and reliabilitj. On the other hand, the 
authors point out that the norms were established b> administering 
the test to more than 50,000 pupils in fort>-six states. Assuming that 
the samples used for evaluating pupil performance are valid^ the 
tests niaj be useful. On the other hand, any test which in today's 
vtortU stresses mechanics as opposed to the expression of ideas must 
be considered suspect and of limited use. 



Kansas Elementary and Intermediate Spelling Tests. Test K forms A 
and B; Test IL forms A and B; and comparable forms for Interme- 
diate tests. Connie Moriti and M. W. Sanders (Elementary); Alice 
Robinson and M, W. Sanders (Intermediate). Emporia* Kans,: 
Bureau of Educational Measurements, 1964* 



Each of the Elementary tests consists of fifty \\ordSj each of which 
has been spelled four different vtays; the correct one is to be under- 
lined. The Intermediate test consists of a list of eighty-five words, 
ttith four spellings given for each \vord; the correct word is also to be 
underlined. Both tests have a time Umlt of liftcen minutes, 



ERIC 



53 



ELEMENTARY SCHOOLLANCUACETESTS 6t 



One has to question vihctlicr this is rcall) the \\ay to test pupils' 
knowledge and abilit> to spell. Recognizing an incorrectly spelled 
worJ, it seenfs to me, calls upon skills different from spelling and 
writing the word eorreetly. 

The words in the test were selected from the Buckingluim Exteth 
3iot: oj tItcAym Sptilittg Scah, thclowct Spcllittg Sctik\ the Thorn- 
dike Won! List, and a nuntber of reeogni/ed spelling tests. These 
publication^ helpcil determine Jiftleulty level of the words and the 
proper grade plueenient for them. Tlie incorrect spellings themselves 
were drawn from a btuJ) of pupils' spellings anJ from the inctdenees 
of choices of spelling on preliminary editions of the various divisions 
ofthete,st. 

.The overall evaluation of the tests has to be made in the form of a 
i|uestioii. does the test do an> thing that a good review test in a spell- 
er ivould not do other than provide norms and percentiles in a 
manuiil of directions? 



Calffomta Achievement Tests. Levels 1-3, form A. Ernest W. Tiegs 
and Willis W. Clark. Monterey, Calif.: California Test Bureau/ 
MeGraw-Hill Book Co., 1970. (See Hook review p. 81 for Levels 4 
and 5.) 



The Caiijhmia Athievcmcni Test3 are designed to measure, evalu- 
ate, and analyze school achievement from grades 1.5 through grade 
12. They are presented in machine*seoreable format. Setting the 
Mathematics section aside, we tind two main sections: Reading and 
Language. The first, Reading Voci^bulary, includes four pictures, 
one of which shuittd be marked in response to a sentence that is read 
to the children (ten items), in ten other items children are asked to 
indicate the first letter of a word that is read aloud; in ten further 
items children arc asked to indicate the final letter of a word that is 
rcail aloud. Also under Reading Vocabulary are fifteen items in 
which children have to match letters; ten items to match words; ten 
items to match words and pictures; twelve items to match words 
read with a word heard, and fifteen items to match a word in context 
with an isolated word. In all eases except one, children are given one 
of four choices. In the second division, Reading Comprehension, 



EMC 



59 



62 WILLIAM A.JENKINS 



children are asked to answer t^^enty-four questions based on short 
reading passages; again, four choices are given. 

The second section. Language, covers Auding, Capitalization, 
Punctuation, Usage and Structure, and Spelling. Auding includes 
five questions that are read aloud^ with the choices to be marked; 
and ten items in which the children carry out a scries of instructions. 
Capitalization has twenty-four items in the context of a sentence. 
Punctuation has fourteen items in the context of a sentence. Usage 
and Structure has twenty items in the context of simple sentences. 
Spelling has twenty groups of five words, one of which may be 
spelled wrong. 

Among the three different Levels of form A, there arc few signifi- 
cant differences from level to level. In Level III there is more empha- 
sis on Usage and Structure^ with forty-one items comprising the test 
And the brief Auding exercise that appean in Level I is not repeated 
at other levels. 

Evaluation 

There is nothing particularly distinctive about these tests. Once 
again, pictures arc used in the text. The student has merely to choose 
the right item from among the three to five presented. Nothing in the 
tests appears to recognize national, cultural, racial, religious, or 
socio-economic differences; for example, there is in one picture an 
amalgamated or homogenized school building, and the same is true 
for depictions of buses, automobiles, airplanes, etc. 

The lack of cultural and socio*cconomic recognition in the test is 
probably j weakness. The length of the Examiners Manual, eighty- 
seven pages for form A, is both a strength and a weakness: for the 
average classroom teacher, the exhaustive detail may be a liability; 
for examining departments of school systems, the thorough treat- 
ment may be advantageous, if it does not overshadow the test itself. 
The sections of the test do not appear to be weighted. Therefore, an 
evaluator must make a comparison between the number of items de- 
voted to specific language skills and consider whether there is a 
skewncss in the test development and whether the number of items 
adequately reflect knowledge or skill in language use. The sections, 
from level to level, include approximately the foilowingem phases: 

Reading Vocabulary: 41 items 
Reading Comprehension: S items 
Reading in Books: 40 items 



ELEMENTARY SCHOOL LANGUAGE TESTS 63 



Capita^zation: 30 items 
Punctuation; 36 items -i 
Usage and Structure; 25 items 
Spelling: 25 items 

Over the years, school administrators and teachers have evidently 
found this test useful and of high quality. It apparently is an easy 
test for children to take and for teachers to administer, apart from 
the very complete Exuminers Manuah National norms based on 
mon; than 300.000 individuals have been developed for the test and 
are constantly revised* although the latest edition (1970) simply does 
not recogni/e diftcrenees in American society. If the samples are 
valid, English-Language Arts teachers, or at least primary grade 
teachers, may use the test to diagnose some strengths and weak- 
nesses. The skills measured are basic to a great many learning areas 
other than Language Arts. 



Comprehensive Tests of Basie Skills (CTBS). Levels 1 and 2, form 
Monterey, Calif.: Calilbrnia Test Bureau/McGravt-Hill Book Co., 
1968. (See Hook review* p. 85 for Levels 3 and 4.) 



The Comprehensive Tests cj Basic Skills is a series of tests with al- 
ternate forms for grades 2.6 to 12, divided into four levels that over- 
lap at grades 4, 6, and 8. The batteries test skills in reading, lan- 
guage, arithmetic, and study skills. They can be used for a survey of 
individual and group performance in basic skills and, aecording to 
the publisher, for analysis of learning. They were developed for na- 
tional use by students who have been taught by different ap- 
proaches. They generally measure aehievenient* 

In Level K tlie forty-item vocabulary test simply does not seem to 
be representative of the vocabulary of present-day eight to eleven 
year olds. In Test 2, Reading Comprehension, the publisher evident- 
ly has included items of letter discrimination, word discrimination, 
and phonies as eomprehension. As an example, the student is 
asked to choose the word from among A/uni; keep, basket, and book, 
in which the k does not sound. This is not a test of eomprehension* 
Similarly* the student is asked to mark the word which sounds like 
tttcat. from among met, tmttVf meet, and ttuu* It should be admitted. 



EMC 



61 



64 WILUAMA. JENKINS 



however, that most items are not like this. They follow the usual 
comprehension test pattern of having thestudent read a paragraph 
or two and then answer questions on them. The test of Language Ex- 
pression is satisfactory, as are those on Language Mechanics and 
Spelling. 

The five subtests in Level 2 of interest here are: Test 1, Reading 
Vocabulary (forty four-choice items); Test 2, Reading Compreheti* 
sion (forty -five four choice items based on short reading celections, 
primarily prose). Tests 3 and 4^ Language Mechanics and Language 
Expression (items in a letter and in an essay which must be filled in 
with the proper punctuation or capitalisation mark for the first 
twenty five item:* and the most appropriate word missing in context 
must be filled in for the usage items, a total of fiftyTive items); Test 
5, Spelling (thirty groups of five words each, with one or no words 
misspelled in each group). 

Each of the tests is accompanied by an examiner's manual. The 
manual describes the test, gives directions for administering them, 
makes suggestions un scoring, reporting and interpreting the results, 
and provides norms for the test scores. 

Weaknesses 

The weaknesses of the tests have been alluded to in analyses of other 
tests: the material invariably is in a shallow and minimal context, or 
not in context at all, little attention has been paid to levels of English 
usage other than standard English; and the authors have not been 
much concerned with making the test **culture free/* or at least rec- 
ognizing the widest possible range of cultural contributions to the 
English language and to American society. The tests are also mea- 
sures of cognition rather than the ability to perform in the English 
language arts, that is, students are not asked to speak, write, or lis- 
ten at all and they do not read for a sustained period. Most measures 
of ability arc taken by inlcrcuce rather than dircctly. 

Strengths 

The tests can be adtninistered by the average classroom teacher who 
can get a picture of the relative standing of his or her students, com* 
pared with each other, or with classes across the country; and the 
tests can be administered a part at a time to offset fatigue and to 
prevent using several entire days of school for administration of the 
full battery. 

62 



ERLC 



ELEMENTARY SCHOOL LANGUAGE TESTS 65 



Metropolitan Achievement Tests (MAT). Primary batteries I and IL 
Intermediate^ and Advanced; forms F, G, and H foreach* Walter N» 
Durostj Harold H. Bixler^ S. Wayne Wrightstone, George A. Pres* 
cott, and Irving H. Balou. New York: Harcourt Brace Jovanovtch, 
1970. 

Metropolitan Aehievetnent Tests. Elementary battery, form A. 
Harold H. Bixler^ Gertrude H. Hildreth^ Kenneth W. Lund, and J. 
Wayne Wrightstone (Walter N. Durost, general editor). New York: 
Harcourt, Braee& World, J958. 



Test 1 of the Primary I battery (grades 1.5-2.4), Word Knowledge, 
has thirtyTive four-choice words based on a single picture. Test 2, 
Wcrd Analyiii, mcludes forty four^choice items measuring pupils' 
knowledge gf sgund letter relationships or skill in decoding which 
are marked in response to an oral question. Test 3 checks under- 
standing of the reading of sentences; three choices are given to de» 
scribe a picture and the pupil is required to mark the correct sen. 
tcrcc. Test 3 also includes the reading of stories of three to seven 
sentences in length. The pupil is to check the correct description of 
the story. There arc forty.two reading sentences and stories com- 
bined. 

Five tests in the Elementary battery (grades 3.5-4.9) were ex- 
amined: Test 1, Word Knowledge; Test 2, Reading; Test 3* Lan* 
guage; and Test 4, Spelling, The test of word knowledge asks the 
pupil to pick the right uord from a series of four which matches a 
given word. Forexample^ ''A husband is a woman, boy, girl, man.'' 
The Reading test consists of paragraphs of increasing difficulty 
which the student must read and answer questions about. A sample 
paragraph is ''Mother made a cake. She put candles on it. The can- 
dles told how old I was. Mother got iee cream and candy. She got 
paper hats. She asked children to come to our house." The questions 
are (1) "Mother was getting ready for — Halloween, a birthday, 
Christmas, a picnic'*; (2) **What did mother put on the cake? Can- 
dles, eandy, ice cream, paper hats"; and (3) "The paper hats were 
to — eat> light, wear, read/' 

I'he Elementary Language test comes in two parts; Part A is con- 
cerned ^^ith usage and Part B is concerned with punctuation and 
capitalii^ition> Twenty .four usage items are given, the pupil having 
to indicate whether the usage is "right" or *'wrong." In the eapitali* 
zation and punctuation section, fourteen sentences are given and 



63 



66 WILLIAM A. JENKINS 



various pans of the sentence are pointed to. By means of eheeks or 
insertion of eapital letters or punctuation marks the pupil indicates 
that the questioned element is right as it is or that it needs changing. 
In the Spelling test the teaeher reads aloud a list of forty words in 
Sentences to the pupits. The worcf tobe spelled is givcui the sentence 
is read) and the word to be spelled is repeated. 

The Intermediate battery ineludesTest U Word Knowledge* fifty 
four-ehoiee items; Test 2, Reading, whieh includes forty-Five ques- 
tions based on stories which become increasingly longer and more 
ditTicnlt; and Test 3, Language, which has a total of 103 items on 
usage, parts of speech, ptiitetualion and capitalization, and Ian* 
guage study skills. These are all three, four, or tive-ehoiee items. 
Test 4, Spelling, has titty words in sentence context that must be 
marked as being spelled eorreetly or not. The Advanced battery fol* 
lows the same format as the Intermediate, w ith the items of greater 
(lillieulty. 

Tlte test kits inelude sueh items as the Individual Profile Chart, 
Class Analysis Chart. Class Reeortl, Raw Seoie-Standard Conver- 
sion Table, Direetions tor Seoring, and Direetions for Administer* 
ing. The amount of material to be read by the teaeher is indeed 
great, yet the direetions are etear and tlte te^t ean be administered 
and interpreted by the elassroom teaeher. 

Uses for Engfisk'Language Arts Teachers 

The measure, to some degree, what is being taught in the 
schools, although admittedl) most of these items are out of eontext 
as far as language arts aetivtties are eoneerned. The pupil reaets 
rather than aets. The tests ean be administered individually; that is, 
the Spelling test may be administered at one time and the Voeabu- 
lary test at another. As far as eertain seleeted language arts knowl- 
edges are eoneerned, the tests measure achievement, with limited 
(liagnostie uses. 

Weaknesses 

The most obvious general weakness of the tests, of eourse, is that 
language arts knowledges and skills are taken out of context. It is 
questionable, for example, whether reeognizing a misspelled word 
and spelling the word eorreetly when writing call upon the same 
skills and knowledges. One might also ask whether a single sentenee 
can trul> provide a context in which either punctuation or capitaliza* 
tion or usage items can be evaluated. Perhaps a minimum context 



Er|c 64 



ELEMENTARY SCHOOL LANCUACETESTi; 67 



would be of paragraph length, A second weakness is centered on the 
question of whether or not punctuation and capitali^cationare as im- 
portant as usage* and \\hether the weighting given to these parts of 
the test is defensible, A third \^eakness is that the tests are pegged to 
the ability to usestandard English, Finally* with the exception of the 
Spelling the Elementary batter) offers only silent reading tests. 
They don't get at the oral language of children at alt. One cannot 
assume that performance in oral language will be or can be extrapo- 
lated from performance using written language, 

Sirengihs'^ 

The directions for administering the test are complete and clear. The 
fin^ilyscs of the test are thorough and thoughtfully done. Provision is 
made for the performance of individual classes in taking the test* 
iU\d the various subtests have been divided and analyzed rather well, 
Tlie test has a good reputation among educators, although it is 
usually administered by central office personnel rather than by the 
individual classroom teacher. 

Comment 

One is struck by the sameness of tests in the English Language Arts, 
Although discrete items may vary, the patterns are much the same. 
Without resorting to statistical analyses of these tests* one finds little 
difTereiice among them. Thus, one is hard pressed to recommend 
one test over another. 



Science Research Associates Assessment Survcj^ Achievement Ser- 
ies, Blue, Green, and Red LcvelSi form E, Robert A, Naslund, Louis 
P, Thorpe, and D, Welty Lefever, Chicago; Science Research Asso' 
iiatcs, 1971, * 



The Science Research Associates Achievement Series consists of a 
set of norm -referenced tests which survey general academic prog- 
ress. There are two editions in the series, primary and multilevel. 
The latter is reviewed here, 

' The multilevel forms — Bluci Green, and Red — are of graduated* 
overlapping difficulty and are intended to cover grades 4 through 9, 



63 



68 WILLIAM AJENKINS 



Each set contains tests in reading, matlicmatics, language arts, so- 
cial studies, use of resources, and science. Onl> the Reading, Lan- 
guage Arts, and Sources tests will be reviewed here. 
The tests consist of the lollowing items: 



Reading 


Dine 


Green 


Red 


Restate Material 


u 


11 


15 


Secjiience (1)1(1 Sunimanxe 


7 


5 


6 


Draw Inferences- 


11 


14 


12 


Apply to New Situations 


6 


5 


6 


Logical Relationships 


13 


13 


9 


Reading Voeabuhtty 








Phrase Context 


30 


30 


30 


Story Context 


12 


12 


12 


Luuguuge Arts: Usage 








Capital iv,ati on 


11 


9 


8 


Internal Ptinctnation 


11 


13 


14 


External and Special Piincttiation 13 


13 


13 


Nouns, Verbs, and Pronouns 


!9 


17 


14 


Modifiers and Connectors 


6 


8 


11 


Linguistic Analysis and Diction 


10 


10 


10 


Language Arts: Spelling 


40 


40 


40 



Use of Sources 

Dictioimry ' 10 8 8 

Table of Contents 10 8 8 



The specimen test set included a scvcnty-six*page booklet. Using 
Test Results, as w ell as a Technical Brief and Multilevel Examiners 
Manual, far more than the average teacher will need in order to give 
the tests or will want to kno\\ about them. However, the discussion in 
U^ing Test Results of such things as ^'Factors Aflccting Test Re- 
snhs'* and *'Conimunicating Test Results'' are good teacher aids. 
'Hie range of background and exploratory material would aid a 
school system which is planning a major achievement asbcssment of 
its instructional programs, researchers and statisticians who wish to 
analyze the test development program* and the classroom teacher 
who administers the lesl lo his or her students* The important ques* 
tion. then, is the reliabili^ and validity of the tests* 



1 ndex 
References 
Catalog Cards 



8 8 
12 * 12 
No items 4 



8 
8 
8 



ERLC 



6(> 



ELEMENTARYSCHOOLIJVNCUACETESTS 69 



The test is based on the premise that there is and should be a rela- 
tionship between instructional programs and testing programs. 
Judgments and actions ^^hich can affeet a sehooVs program ean be 
tesl-reiatcd if program and tests are compatible. But the testniakers 
hedge a bit — and thej should — in pointing out that judgments based 
strictly on test results vihich affect individual students should not be 
taken as sole sourecs. They should be used along with the teaeher's 
observations and the results from other diagnostie measures. 

Uses for English-Language A rts Teachers 

The Reading test measures the <ibility to understand and evaluate 
material vihich the student has read and relate what has been read to 
other ideas (Comprehension). This is added to a Vocabulary seore 
ba:>cd on the recognition of common \\ords> usually presented in a 
phrase context. 

The Language Arts test is divided into two seetions. Usage and 
Spelling. Usage measures knowledge — and, io some extent, use — of 
punetuation, capitalization* manner of expression, word and sen- 
tence order, and the organization of ideas. The Spelling test is based 
on a reeognitiott of spelling errors. 

The Use of Sources subtest measures knowledge of the eoninton 
reference tools and guides, the dictionary^ the table of eontents, the 
index, referenees, and, in the two upper levels, catalog cards. 

Weaknesses 

If there is a weakness in these several levels of tests, it must be cen- 
tered on the limitation of paper and peneil tests to measure all kinds 
of aehievement. For example, the assumption is that if students ean 
ehoose among several alternative grammatieal usage forms, then 
there will be an indication of their ability to write the language with 
Hueney and aauracy. The test materials suggest, for example, that 
seores for the Language Arts test indieate the extent of a student's 
written language skills — how properly he or she ean use the English 
language. This is simply not so. The test will measure speed in mak- 
ing the choices from among language usage items (the tests are all 
timed) and aceuraey in making the right choice. -How well the stu- 
dent would eompose a paragraph or a sentenee or a longer essay 
eannot be determined and should not be inferred from such a test. 
The same question might be asked about eapitalization and pune^ 
tuation. Choosing eorreet answers does not indieate how tvell a stu- 
dent ean eapitatize and punetuate written materials. Aetuatty, the 



67 



70 WILLIAMA.JENKINS 



test measures the ability to reeognize errors and misusages made by 
other people and» again, to indieate these \iith some degree of aeeur- 
aey and speed. Whether the student ean write and eapitalize and 
pnnetuate eorrcelly is still unknown. A transfer from knowledge to 
ttse is assumed but not proven. 

The same might be said for the Spelling test. The seore does not 
indieate how well a student ean spell familiar words. Theinferenee is 
made that if a student ean reeognize an incorreetly spelled form or 
spell a word in isolation as opposed to writing it in eontext> an 
indieatioii will be given of the student's ability to spell when writing. 
This again is a debatable inference. 

The elainis lor the Reading test are more valid. A student's seore 
for the Reading test does indieate the level of some reading skills — 
how well he or she understands \i'ritten language and how well he or 
she ean undentand the meaning of written words^ phrases, or sen- 
tences. The seore on the Comprehension test tells how well a child 
ean take in information and ideas from stories and essays. The Vo- 
cabulary seore indicates how well a child understands the meaning 
of words according to their contexts. The Reading seore, then^ is a 
composite seore indicating a test-taker's overall achievement level in 
the test areas at the time of testing. 

A moot question, and the most critical one, is whether the knowl- 
edge areas in which the child was tested arc those that are most crit- 
ical foretTeetive readingand writing. 

Sttvngtlts 

The tests should be as easy to take at grade 4 as at grade 9, although 
the slow' reader who is a good speller might have difficulty in show- 
ing it, for this is a silent trading test. 

Liberal time limits have been set for the tests. However, the over- 
all four and one-half hours (tw^^o hours and twenty minutes for the 
parts reviewed here) ean only be viewed as fatiguing. 

The interpretive and administrative materials supplied with the 
tests are bulky but they should be helpful to teachers, as well as to 
administrators who are planning a testing program and those who 
are dc\"eloping coniparati\e data on student achievement. For those 
educators who read earefull) and reeognize test limitations, the limi- 
tations of these tests are spelled out: the tests measure how much 
students know about certain things and they are best used in com- 
paring one group of students with another. The tests are f^lso reli- 
able. Administration to a sizable number of students and the analy- 
sis by a numberof education experts ha\e brought them to this level. 



63 



ELEMENTARYSCHOOLUNCUACETESTS 71 



Comment 

The authors apparently make no reference tosocio'cconomic or cul- 
tural differences among the students ^ho might take the test> nor is 
recognition given to the fact .that urban, suburban, and rural chil^ 
dren will bring different educational attainment levels and concepts 
to the testing situation. Onee again, the sameness of all language 
arts tests is striking. They have been more alike than different for 
forty years. While the interpretive materials regarding the te sts have 
become more complex and sophisticated, the tests themselves have 
changed little. We still test spelling ability, for example, on the basis 
of recognition of incorrect spellings when we give the child a stan* 
dardized test which must be read. The only major difference among 
the various tests I have examined, apart from statistical differences, 
ure rjfmements in wording, in choice of distractors and stems, and 
the variable weighting given to test sections. 



Cooperative Primary Tests. Forms 12B and 23B. Princeton, NJ.: 
EducationalTestingService, 1965. 



These tests probe baste understandings of verbal and quantitative 
eoncepts at the primary school level. The series includes six tests: 
Plan-a-test, Listening* Word Analysts, Mathematics* Reading, and 
Writing Skills. The tests encompass the end of grade 1 through 
grade 3. The Plan-a-test(ten items) is for praetice. The rest are given 
in this order; Listening, Word Analysis, Mathematics, Reajing, and 
Writing Skills. They attempt to measure major educational objec- 
tives regardle^^k of particular curriculum programs and methods. 
The also attempt to minimize the dependence of one skill upon an- 
other and to be as interesting for children as possible. The tests are 
untimed, although the average time for administration is listed in 
the testing handbook. 

Uses for Ettglish'-Langitagc Arts Teachers 

It must be emphasized that these tests are primarily for teachers in 
the primary grades rather than for Language Arts teachers, aU 
though they do measure baste understandings supporting advanced 
work in the Language Arts, The Listening tests (fifty items) are tests 
of listening comprehension ability. In the main, children simply 



69 



72 WILLIAM AJENKINS 



select a word Tn^ni tlirccclioiccs that rhjnies vtith^ bupplcmcnts. or is 
the opposite of the word which the teacher says. The Word Analysis 
tests (thirty nine items) measure understanding of .struetural and 
phonetic properties of words. By means of rhyming words, analysis 
of syllables and vowels^ initial consonants, ending consonants and so 
oUt this skill is measured. The Reading tests (fifty items) measure the 
ability to read words, sentences, paragraphs, and longer passages 
with understanding. After initial instructions are given, the children 
work on their own. The tests of Writing Skills (forty items), measure 
the ability to identify correct spelling, punctuation, and English 
usage. 

Weaknesses 

One eannot help being struck b> the sameness of the tests, whether 
they elaim to measure aehievement, basle understandings, or basie 
concepts. I think these are good tests, carefully construeted, with the 
items chosen with a considerable aniount of eare. However, exeept 
for au oceasional unique choiee ofa word, phrasing, or pieture, the 
tests have little to distiiignLsli them from another dozen or so tests on 
the market. Perhaps one who would analyse the lengthy test hand- 
book with its forty one tables and discussion of iiorniing, equating, 
se^iting, and relating would Hnd unique features in the test. But the 
average elassroom teacher will not onlj not read the handbook, he or 
she*will probably be repulsed by it. 



Tests of Basic Experienees fTOBE). Levels K and L. Margaret H. 
Moss. Monterey, Calif.. California Test Bureau/MeCraw-Hill Book 
Co.. 1970. 

The Language test is one of five tests in TOBE and is available in 
two Forms: Level K, designed for children of presehool or kindergar- 
ten age. and Level L, designed for both kindergarten and fiist-grade 
ehiklren. The test is a measure of the ehild's mastery of <;ertain eon- 
eepts whieh will affect his or her ability to learn further eoneepts. 
The test purports to deal with basic language eoneepts, ineluding 
vocabulary, sentenee structure, verb tense, sound-symbol relation- 
ships, and letter reeognitioii* It does so by means of pietures. The 
ehild is asked simply to make a straight vertieal line to answer a 
question. For example, a picture is shown of four different items; 



70 



ELEMENTARY SCHOOL UNCUACE I'ESTS 73 



strawDerriv'S* a carton of milk, a birthday cake, and a carton with a 
dozen e&gs in it. The child is asked to mark the birthday cake by 
simply drawing a straight vertical line through it. The test includes 
some nonsense items v^hich get at the child's ability to derive mean* 
ing from sentence context. For example, pictures of several items are 
sho^n, including a book of matches, and the following statement i^ 
read: **Thc bouglis burn. Mark the boughs.'' The book of matches 
ts, of course, supposed to be marked. 

Uses/or English-Language Arts Teachers 

The test is really not designed for English- Language Arts teachers. 
It is more a test for early childhood education where the mastery of 
basic concepts is more important than considerations of language 
and vocabulary. Such ability is basic to later Language-Arts work, 
but it is doubthtl that the test should be considered ^ Languagp-Arts 
test. 

Weaknesses 

An obvious v^eakiiess is whethci; the items pictured here truly reprc' 
sent the most basic concepts that a child can have. Again, there 
appears to be no variation for urban, rural, disadvantaged, affluent, 
urban, and suburban children. Perhaps there should be. 

Strengths 

The test easy to administer and from all appearances should be 
easy for a child to take. The technical data provided are held to a 
minimum and instructions are clear. The information on reliability 
and validity of the test is quite readable. Evidently this is a good test 
which has been standardized, even if it does not belong within the 
Language-Arts category. 



Cognilivt Abilities Ttst. Primary L form 1; Primary II, form 1. 
Robert L. Thorndike, Elizabeth Hagan, and Irving Lorge. Boston: 
Houghton Mifflin Co., 1954-68. 

This test is designed to assess the development of cognitive abilities 
from kindergarten to the first year of college. Primary I, form K is 



71 



74 WJLUAM A JENKINS 



lor the second half of kindergarten and grade 1; Primary II, form 1, 
H for grades 2 and 3. However, the publisher recommends a varia- 
tion i' the use of these forms in average communities, communities 
with gh socio economic levels^ and those with low socio-economic 
levels. T^ic grade placement, therefore, is relative and should be 
checked on the chart provided with the test. 

Both forms consist of a series of five pictures; directions and ques- 
tion!! are given oraH>. Tlve child simply has to fill up a square or 
mark an oval to answer the questions. The ability to perceive and 
discriminate is tested, but not the abihty to read. Listening is ex- 
tremely important. The test is presented as a power test, not a speed 
test; thus, it is not timed. It closely resembles the Tests of Basic Ex- 
perienccs written by Margaret H. Moss. 

Uses for Eiiglish'Langtiage Arts Teachers 

The test purports to measure skills basic to learning to read (as well 
as learning arithmetic and science). The authors claim that a child 
who obtains a low total score on the test is likely to have considerable 
ditlleulty in learning to read and in adjusting to other demands of 
tlie formal school situation. It seems to me that this. would follow, 
since the ability to perceive and discriminate are basic to many 
learning activities. The test is constructed with items ranging from 
easy to difficult. As a matter of fact, some of the advanced questions 
in Primary II, form 1, are indeed difficult, and appear to be like 
questions which appear on the Army classification test and other 
tests of general intelligence. 

WmknesseS 

A potential, though perhaps not real, weakness of the test is wheth- 
er it can discriminate among youngsters and whether it can achieve 
its objective of revealing *Mhe full range of individual dilTerences in 
kindergarten and grade 1.'' The second possibleweakness is whether 
or not the extreme differences created by socio-economic deprivation 
and cultural differences have been accounted for. At the time that 
the test was originally copyrighted inJ954, such considerations were 
not recognii^ed. There is no evidence that the 1968 revision of the test 
takes account of this variable. As in tests of this sort, one must al- 
ways ask whether or not what are truly basic concepts have been in- 
elndcd, or whether the sample provided is truly representative. It 
appears that the effectiveness of the test to no inconsiderable degree 
resides in the ability of the teacher to read the oral directions, tosus- 



ERIC 



72 



ELEMENTARYSCHOOLLANCUACETESTS IS 



tain the children s attention to a task for a fairly lengthy period of 
time* and to make clear the directions to the test which at times 
might become a bit djfficult. I include this example from Primary II, 
form 1, to show the dittlculty in listening to the test: "Look at the 
two boxes all by themselves. (Pause) Noxv^ find a box that shows how 
many more sticks there are in the,/irs/ box than in the second box. 
(Repeat) Fill in the oval under the box you chose." 

Strengths 

The test has been thoughtfully prepared, it should be easy for most 
children of average ability to understand* and the teacher experi^ 
enced in administering tests to children should have no difficulty 
with it. The technical information on scoring and recording how to 
use the test results, the reliability and validity data» the table of 
norms, and the percentiles are held to a minimum, and do not over 
shadow the test ifoelt The variation for socio-economic level recom- 
mended by the authors and the kinesthetic use of the finger to estab- 
lish the place in the test to prevent the children from becoming con- 
fused are also good points. 



73 



Tests on the English Language 

X M Hook 



In his introduction to this book. Professor Alfred Grommon quotes 
a 1971 NCTE resolution that urges the study of "standardized tests 
of English ... in order to determine the appropriateness of their 
content to actual instructional goals ^nd the appropriateness of test 
norms to students'* and the problems in the use and interpretation 
of tests/' This section is an attempt to re-examine tests of the En- 
glish language with those madates, particularly the first, in mind. 

The tests reviewed here, if the various forms are counted separate- 
ly, total well over a hundred. Reading the thousands of items con- 
tained in them enables the reviewer to draw a few conclusions about 
the largely unsatisfactory state of the art. 

Very noticeable is the narrow coverage of the tests. One finds 
nothing about dialects. Nothing about history of the language. Al- 
most nothing directly about semantics, unless one counts vocabulary 
items as semantics. Nothing about etymology. Little, except in a very 
few tests, about the actual working of the English sentence. All these 
aspects of the language are becoming of increasing importance in 
the English curriculum, but the makers of the tests reviewed here 
have not yet caught up with the profession. Once they are included 
in tests, such facets of English are likely to attain even more frequent 
inclusion in courses of study, for tests do influence curriculum. 

Still more obvious is the testmakers* concern for '*correctness." 
Some three-fourths of all the items in these tests ask the students to 
determine whether or not something is "correct": a spellings a 
choice of verb, an arrangement of sentence parts, etc. These tests 
confirm the stereotype of the English teacher as someone primarily 
interested in catching someone making errors: **0h, if you^re an 
English teacher, Td better be careful of what I say." 

The urge to measure * 'correctness" is of course an outgrowth of 
what English teachers have stressed, and the general public has ex- 

76 



ERLC^ 



74 



TESTS ON THE ENGLISH LANGUAGE 77 



pectcd or demanded^ lor many ycavh. So testmakcrs should not be 
blamed for supplying wliat the buyers have wanted. But the resulting 
tests, untbrtunately> nave a built-in cultural bias. Students most 
likely to do well on them are thme from educated, white, Anglo- 
Saxon homes, who heard ''good English" while lying in their baby 
cribs and have:>eldom heard anything else. Students less likely to do 
well come from less^educatt-d families* are often nonwhite, and fre- 
quently come from homes where a language other than English is 
the first or only language spoken. Test results for "correctness," in 
all fairness, should not be used to make invidious comparisons 
among students. And if "correctness*' is less important than other 
aspects of language use, such as clarity, directness^ and effective- 
ness, perhaps tcstmaken> should de-emphasize it and try harder to 
measure what is most important. 

Some of the testniakers have simply not kept up with develop- 
ments in the scholarly study of usage. In consequence* they count as 
wrong a number of answers that usage reports like those of Mar- 
garet Bryant* Bergen and Cornelia Evans, and Raymond D. Crisp 
show are established^ and some (e.g., past tense and sung) that 
arc included without comment in dictionaries as reputable as Web- 
ster s Third. Alsos in almost none of the tests is social eontext eon- 
sidered — the fact that a usage not suitable for a very formal paper 
ma> be quite satisfactory in ordinary conversation or in a letter to a 
good friend. 

Artificiality of items varies from test to test, Occasionally, the re- 
viewer wonders whether some of the testmakers have ever seen the 
inside of a classroom or read a composition by a child; some of the 
sentences they offer for student reaction were not just dreamed up 
but may have been nightmared up. 

Despite protestations in some of the manuals, most of the tests re^ 
veal students' abilit> to recognize but not necessarily their ability to 
perform. Thus time after time students are asked to identify which 
one in a group of words is misspelled, but seldom are they asked to 
spell. Time after time they are asked whether a word is used cor- 
rectly^ bui seldom are tesjmakers ingenious enough to compose test 
items requiring students to show whether or not they, use the word 
correctly. Scoring ease is usually the villain; a test of student per- 
formance rather than of recognition ordinarily takes longer to score 
and hence is avoided, although a few clever testmakers have gone far 
to whip this problem. 

Review of these tests shows'still another serious defieieney: the 
lack of adequate diagnostic instruments. For instance, though a 



EMC 



75 



78 J,1J,H00K 



Spelling test may re^^eal that a given student scores only 70 percent, it 
does not show what the student's spelling problems are; if it did ofTer 
a diagnosis of individual cases, remediation might be easier. Simi- 
larly, there should be diagnostic tests that indicate major problems 
each student has with verbs and pronouns, and with sentence con- 
struction. At present, only by time-taking, unguided labor can a 
teacher find a sttident*s specific areas of weakness, once the test re- 
sults are available. Something much more specific than just a total 
score on spelling or on grammer is needed. Testmakers would serve 
the profession better than they have if they could come up with some 
good diagnostic instruments Instead of concentrating on achieve- 
ment. 

As the reviews indicate, a few favorable comments may be made 
on a number of the tests. One Is that testmakers have largely aban* 
doned the old-fashioned items dealing with mere grammatical Iden- 
tifications: picking out subjects or indirect objects, labelling adjec^ 
lives and adverbs, identifying^ complex sentences, and the like. 
Another Is that some testmaker:* escape several of the criticisms 
made above, although none escapes entirely. There is at least one 
test that tries to measure knowledge of how the sentence works; 
there are a few that require the student to spell a word and not just 
recognize that it is misspelled; and there are a few that pay consider- 
able attention to sentence effectiveness and not just correctness* 
Some testmakers have displayed considerable ingenuity in thdr de^ 
velopment of tests that are interesting to take and fairly useful in 
what they reveal. Improved versions of such tests may be of value in 
making measurements useful for purposes of accountability. 

But there Is still far to go before the profession Is well served by 
those who construct tests of the language. Several excellent English 
language tests or batteries of tests should be available from which 
teachers may choose in light of the needs of their own students* At 
present the range is from very poor through poor, and fair to good 
but limited. Still to be reached is excellent and extensive. 

The reviewer can not assert that the criteria he followed In his 
evaluation are the only ones possible, or that they are assuredly the 
best. In any case, the basic criteria followed are these. 

L Study, and therefore testing, of the English language should be 
broadly conceived to include language history, semantics, di- 
alects, grammar (in the sense of description of the actual 
arrangements and workings of the English sentence), usage, 
spelling, punctuation, principles of word choice, and accuracy 



TESTS ON THE ENGLISH LANGUAGE 79 



and extent of vocabulary. Some borderline areas — spellings 
punctuation, word choice — may justifiably be treated in either 
language or composition tests. In fact, these two areas often 
overlap. All the branches of English language study listed ob- 
viously cannot be Included in one test, but language tests taken 
as a whole should cover them all. 

2. Items chosen should be as free of cultural bias as possible. 

3. Language use rather than theory should be stressed. 

4. Tests should be measurements of students' ability \odo rather 
than their ahxUty \o recognize. 

5. Debatable items in spelling, usage, punctuation, etc., should be 
excluded. 

6. Tests should in general reflect the standards of language use 
that are generally characteristic of the writing found in rep- 
utable books and magazines in the second half of the twentieth 
century; that is, they should be up to date rather than a reflec- 
tion of the writing of years ago. (Ideally, since language is at 
base a spoken thing, tests should be no less concerned with the 
spoken language, but effective pencil and paper tests of spoken 
language may by definition be impossible.) 

7. Results of tests should be usable in planningimprovementsina 
school's language program and in individual evaluation and 
diagnosis. 



77 



80 LN.HOOK 



Evaluation and Adjustment Series^ Brown-Carlson Listening Com- 
prehension Test. Forms AM and BM. James L Brown and G. Robert 
Carlsen (Walter Durost, general editor; Harry A. Greene, coordi* 
nator for Language Arts tests). New York: Harcourt^ Brace & 
World, 1953 and 1955. 



This test is designed to measure comprehension of the spoken lan- 
guage at high school and college levels. The test-taker is supplied 
Mith only a special ansv^er sheet. The administrator reads the seven- 
ty-six test items aloud. Time required is *'one class period/' i.e., 
about forty minutes. 
Part A, Immediate Recall, consists of seventeen items in the form 

**In the series of numbers 4-5-3-2-1 the^im number is " 

Part B, Follotting Directions, tells students to perform twenty simple 
operations v^ith a set of numerals and letters printed on the answer 
sheet. Part C, Recognizing Transitions* consists of eight sentences 
read ttithout context, to be identified as introductory, transitional, 
or concluding. Part D, Recognizing Word Meanings* requires stu- 
dents to select from a list on the answer sheet the meanings of ten 
words in context, e.g., 'The soldiers pitched their tents." And in 
Part E, Lecture Comprehension* students listen to a twelve^mtnute 
**lccture'' on vocabulary-building and then answer twenty-one ques- 
tions concerning both details and main ideas. 

Estimate of Validity and Uscfidness 

This Is a well -conceived, rather imaginative test that has not been 
superseded in the many years since it was constructed- None of the 
test items has become seriously outdated* 

One slight weakness is that the test items must be read aloud by 
the examiner. Some examiners do not enunciate clearly; some will 
inevitably pause longer than others while the students choose their 
responses. In consequenee, percentile ranks of students, in relation 
to national norms, may be affected by the examiner. If the test were 
put on a record or a tape, this problem could be eliminated. 

As the manual says, ''The mere administration of the test is likely 
to awaken in students a recognition of the importance of listening 
skills and an understanding of the fact that people vary greatly in 
their listening ability just as they do in most other characteristles/' 
Once such understandings cxist> students may be motivated to un- 



ERIC 



78 



TESTS ON THE ENGLISH LANGUAGE 81 



dcrtakc various class and individual projects to improve their listen- 
ing skills. 



California Achievement Tests {CAT}. Levels 4 and 5, forms A and B 
for each. Ernest Tiegs and Willis W. Clark. Monterey, Calif.: 
California Test Bureau, McGraw-Hill Book Co., 1970. (See Jenkins 
review p.61 for Levels 1-3.) 



The California A chievctnen I Tests consist of a series of test batteries 
in five ovoriapping levels with alternate forms A and B. Only Levels 
4 and 5, for junior and senior high school use, are described here. A 
complete batter) consist;* of tests in reading, mathematics, and Ian* 
guage. 

The contents of the language tests are as follows: 



Mechanics 

Capitalization 

Punctuation 
Usage and Structure 
Spelling 

Totals 



Items. Level 4 
72 

40(7 min.) 
32(14 mi nj 
50(14 minj 
32(8min0 

154(43 min.) 



Items. Levels 
80 

40 (9 min.) 
40(16 min.) 
54(14 min.) 
32 (8 min J 

166(47 min.) 



Each capitalization test consists of two "stories'' and some sen* 
tences, divided into lines of no more than five words. A line may or 
may not contain an error in capitalization. If an error exists in a line, 
the student is to give the number of the word wrongly printed. The 
punctuation tests are similar in format; five punctuation marks (per- 
iod, question mark, exclamation point, apostrophe, and comma) are 
involved. The first part of each usage and structure test consists of 
twenty-eight or twent>-nine sentences that may or may not be writ- 
ten in standard English. The student marks T(rue) for Standard and 
F(alse) for Nonstandard— possibly a trifle confusing, Next come a 
few sentences concerning trans forma tions^ specifically, whether a 
given sentence can or cannot be transformed into other specified 
sentences, Then there are six sentences to be classified as complex, 
compound, simple, or fragmentary, Finally, there are about ten sen* 
tences in which some sort of grammatical identification is called for. 



79 



82 J, HOOK 



e.g., the type of pronoun that iumebutly is. Each spelling item con- 
sists of four words, with the student indicating which word, if any, is 
misspelled. 

According to the Te^t Coordinator s Handbook, the tests attempt 
to synthesize traditional, structural, and transformational gram- 
mars, so that ''the nature of language is really examined/' 

Estimate of Validity and Usefulness 

As is true of most tests of capitalization and punctuation, this re- 
quires recognition rather than doing. A few items are questionable. 
Journalists write Mississippi nver and some book publishers also 
prefer river to /Jiver; the test gives credit only for the capital. For 
"Wow what an idea!" only a comma is regarded as suitable after 
Won , so a junior high school student, overlooking the subtle differ* 
encc created by the small letter innVwr, is penalized if he or she says 
an exclamation point is acceptable. In neither the capitalization nor 
the punctuation sections is the matter of unneeded capitals or punc- 
tuation taken up. Yet all teachers know that many students capital- 
ize and punctuate excessively. 

Some of the usage and structure items may be criticized. Why, for 
instance, is cvcryvne called a personal pronoun in test 5B? Why do 
these testmakers still beat the who-nhom horse? If 79 percent of 
seniors consider an item standard (Usage, 5A, item 29), can test- 
makers say with assurance that it is nonstandard? Another ques- 
tion (Usage, 4A, item 6) penalizes the student who thinks that snnk 
may be used in standard English as a past tense, but even the con* 
scrvativc/t/zjcmw; Heritage Dictionary says that it may be. Item 18 
in the same test implies that myself must not be used for me as part 
of a compound object, but the E\ans usage dictionary calls the con- 
struction **normal spoken English." In 4B, *The boys and they all 
helped yesterday" is called nonstandard. The sentence is certainly 
stiff and ugly, and you and I wouldn't say or write it, but what is 
nonstandard about it? 

About an eighth of the itetps in usage and structure deal spcciFi* 
cally with Uansformational grammar, asking whether a given sen* 
tence can be transformed into another given sentence. Those ques* 
tions are only slightly technical; most can be answered even by stu* 
dents who haven't studied TG. But the ones who have obviously 
possess a slight edge. 

A few more caveats. The answer book says that **Gary and his 
father like to play pool because it is a game of skill" is a compound 



80 



TESTS ON THE ENGLISH LANGUAGE 83 



sentence. Oh? And what part of speech would you say fits the 

blank; 'The fat pigeon waddled to the curb*'? Is it a 

conjunction, a noun, an adjective* an adverb, or none of these? The 
answer book sa>s "none of these/' But wouldn*t a word like over or 
close be an adverb here? And students may lose a point if they 
don*t know what a "participle inflectional morpheme ending" is. 
Only 1 7 percent of high schooUeniorsguessed right on that one. The 
law of averages says that20*percent should. 

The i^ords in the spelling tests seem appropriate for the design 
nated grade levels. Students, though, are once more required only to 
recognize and not to perform, 

The supplementary materials for these tests are among the best 
available anywhere, The tests themselves* despite their problems, 
are better than most, yet it is too bad that the numerous small flaws 
were not eliminated. Many thousands of dollars must have been 
spent on these tests. For just a few dollars more, a couple of English 
experts could have made the tests considerably better. 



Cftlifomift Short-Form Test of Mental Maturity. Level 3, S-form. 
Elizabeth T. Sullivan, Willis W. Clarke and Ernest W. Tiegs. Mon- 
terey > Calif.: California Test Bureau/McGraw-Hill Book Co., 1%3. 



Of the 120 items in this test, twenty-five are in alv«ijal comprehen* 
sion (vocabulary) section, in which the test-takel sjfects synonyms. 
Also of some interest to a teaeher of English is sNm^mory test For 
this, a selection is read to the students at the beginning of the test 
period, and at the end they are asked questions about it. This sec- 
tion, then, tests not only memory but also listening ability. The re- 
maining five sections of the test are nonverbal (three)> involving 
recognition of pictorial opposites> similaHties> ami analogies; and 
numerical (two), involving solution of simple arithmetical problems. 
(Some of the illustrations used in the nonverbal section are rather in* 
distinct.) Tota] vvorking time for the test is forty-one minutes. A tape 
recording is available for administration. ^ 

Tests on other levels, O (Pre-Primary) through 5 (Adult)> are avail- 
able but were not examined for this report. 



ERIC 



^„ 81 



84 /:n.hook 



Estinunc of Validity and Usefulness 

No attempt is made here to estimate the \vorth of this test for mental 
measurement. In the \ocabLlar> section^ the words seem of appro- 
priate difficulty for junior high years. 



Clerical Skills Series, New Rochelle, N,Y,: Martin M. Bruce, 1966. 



The English-related tests in the series are Word Fluency, seventy* 

five partial words, e.g., Po , to be completed by the test* 

taker to make any word he or she knows of four letters or more (five 
minutes); Grammar and Punctuation, forty sentences that may or 
ma> not have an error in grammar or punctuation (untimed); Spell* 
ing, ninety misspelled words "obtained from typewritten letters/' to 
be spelled correctly (untimed); Vocabulary, fifty multiple-choice 
items (untimed), and Spelling-Vocabulary, sixty items requiring the 
test-taker to reeognize an ineomplete word and spell it correctly, 
e.g., "DEF T: win over, vanquish" (untimed). 

Estitmitc of Validity aud Usefuhicss 

The Word Fluency test is a simple but interesting device that prob* 
ably reflects quiekness of mind at least as much as size of voeabu* 
lar>. The Grammar and Punctuation test contains a few arguable 
items: "Data proves" is considered wrong, as are lumg for hmtged 
and '*it was me." Only five of the forty items contain errors in pune* 
tuation, The Spelling test has the virtue of requiring the test^taker to 
spelK not just recognize. A few of the words ehosen seem a bit odd: 
Who toda> is likely to need to spell /et/orti? Misspellings likeit'(«/c 
and c'5e('//cTmay puzzle some test*takers. 

Some of the words in the Vocabulary test are unlike^} ever to be 
used again by most test-takers: shah, treadle, cresset, toucuu, doge, 
deji'dittiou, trier, and pemile. In fact, Webster s Third labels 
dejedatiou archaic, and 'Mever," the test's definition for trendies 
isn't quite satisfaetory. The Spelling- Vocabulary test is a simple but 
ingenious wa> to cheek simultaneously on acquaintance with the 
spelling and the meaning of words of the approximate difficulty of 
fiighnenx and z(ea)lot. 

It is interesting to note that, according to the manual, the Word 



ERIC 



82 



TESTS ON THE ENGLISH UNCUACE as 



Fluency, Vocabulary, and Spclliiig-Vocabutary tciU all have a high 
correlation with \\\^ Otis Mental Ability Test. 



College Qualification Tes(s* Test V, form A. Gcc.-gc K. Bennett, 
Marjorie G. Bennett, Wimburn L. Wallaee, and Alexander 0. 
Wesnian. New York: The Psyehologieal Corporation, 1956. 



Test V is a fifteen -minute verbal test, one of three parts of a battery 
intended to be **broadly predictive of eoltege suecess.** The other 
parts are a numerical test and a wide-ranging information test- The 
test consists of seventy-five vocabulary items, fifty of which require 
identifieaticn of synonyms and twentyfive, identifieation of 
antonyms. Four choices are given for each item. 

Estimate a/Validity and Usefulness 

The first half of the synonym section and most of the antonym see- 
tion consist of words that any reasonably well-read high sehool 
senior should know. The remaining twenty*five synonym items are 
more rarely used words of the approximate diffieulty of voracious 
and eschew. Median seore lor eollege freshman men is 44-45; for 
eollege freshman women, 48-49. High sehool teaehers who are inter- 
ested in finding how their students compare with eollege freshmen in 
vocabulary ean get a quiek approximate answer in this test. 



Comprehensive Tests of Basie Skills (CTBS). Le^vels 3 and 4, forms Q 
and R. Monterey^ Calif.: California Test Bureau/McGraw-Hill 
Book Co., i%8. (See Jenkins review p. 63 for Levels 1 and 2.) / 

The CTBS complete battery eonsists often tests in reading vocab* 
ulary, reading comprehension, language mechanies, language ex- 
pression, spelling, arithmetie (3), and study skills (2). Levels 1 and 2 
are for grades 2.6-6; this review eoneeriis Levels 3, grades 6-8, and 
4, grades 8-12, only. The publishers elaim that *'these tests aim to 



83 



fi6 J.N, HOOK 



measure . . . those UgAWs common to all curriculums and nceJed for 
success in using language and number skills in an^ school in which 
the students of our mobile population Hnd themselves." 
Items in the language tests are as follows: 

Mechanics: twenty-Five items (thirteen punctuation, twelve eapi- 
talization) based on underlined and numbered portions of a let- 
ter and an article; the student indicates whether the punctua- 
tion and capitalization are correct or, if not, what the best 
alternative is. Eleven minutes. 

Expression: thirty items (ten usage, ten appropriateness* ten 
eeonomy and elarity) based on passages of prose and poetry; 
the student chooses the best alternative among suggested an- 
swers. Sixteen minutes. 

Spelling: thirty items; the student selects an incorrectly spelled 
^^'or<l in a group of four, or observes that none is incorrect. 
Eight minutes. 

Estimate ofValitiiiynnd Usefulness 

The provision of context for most of the language items in CTBS 
may be helpful. Howeven there are small but annoying flaws in these 
Contexts: article is misspelled aeticle; the punctuation is incorrect in 
the line "A hunter, keen and brave, as he''; and the poetry examples 
are doggerel. In fact, the contextual prose* as well as the poetry, is 
not well written. And no one except somebody from the Far West 
would say that the home of the Green Bay Packers is in the East. 

Other flaws are also much too numerous. According to Level 4» 
form Q, ''We laugh we cry we learn*' should be punctuated with 
eommas, but according to Level 4* form R, "We laugh ttecry we live 
other people's lives a while" should be punctuated with semicolons* 
The test authors are apparently unaware of restrictive appositives* 
for tJiey insist on a comma in 'Mhe phrase {,] Torrn follows func- 
tion* and in **the column [J *Shop by Mail.' *' In Brooklyn zoo, 
according to the testmakers, ::oo should be capitalized, despite the 
fact that newspapers and many magazines would use lowercase. In 
the spelling section, I have no certainty about what word the mis- 
spelling vR'm is supposed to represent — possibly viands, but that is a 
much rarer word than most iti the test. Also in spelling, how realistic 
is it in our urbanized soeiety to expect a junior high ^Cu%^o\ student to 
know about **alphalpha'* (alfalfa)? There are still more of such 
blunders or blemishes, sm.ul but indicative of lack of sufficient care. 

It seeins unfortunate, too> that testmakers who measure skills 



84 



TESTS ON THE ENGLISH LANGUAGE 87 



have So far been unable to find ways of measuring ability to do rath* 
er than just to recognize. Presumably one who recognizes a mark, 
usage, or spelling as right or wrong will also be likely to use the ^'cor- 
rect" one in one's own w riting. But the presumption does not always 
prove valid. Every teacher has known students who can do well on 
recognition tests like these hut who make numerous, mistakes in 
their own papers. 

In the usage test, the idea of having some items dealing with 
appropriateness of diction is a good one, as is the idea of having ten 
items in which the student must choose the most clear and economic 
cal of four versions of a part of a sentence. To make room for these 
kinds of items, the testmakers had to sacrifice some of the usual 
items, such as wrote vs, \<ritten or he vs. him* The decision was a 
wise one, 



Concept Mastery Test Form T. Lewis M. Terman. New York; The 
Psychological Corporation, 1950. 



This test, designed for college upperclassmen and graduate students. 
Is intended as a measure of ''ability to deal with abstract ideas at a 
high level/' It has no time limit but ordinarily takes about forty min- 
utes. Its subject matter is derived from **a wide variety of subject 
matter fields, such as physical and biological sciences, mathematics, 
history, geography^ literature* music, and so forth/* 

The test has two parts. In the first* the test-taker indicates wheth- 
er two words are synonyms or antonyms. The difficulty increases, so 
that near the last of the IIS items the test^<aker may encounter 
words like sempiternal and transilient, a word not included in three 
of four desk dictionaries, Part II, Analogies, is in the form "Shoe; 
Foot;: Glove; (a. Arm b. Elbow c. Hand)" and consists of seventy* 
five items. 

Estimate of Validity and Usefalness 

Norms given in the manual include scores for the intellectually gift- 
ed children studied by Terman from 1921 on; 1004 of them took this 



RIC 



8 



as J.N. HOOK 



test in 1951-52 and had a mean score of 136.7 out of 190, far higher 
than graduate students in general or electronic engineers and scien- 
tists. Air Force captains (344 of them) scored an unbelievably low 
60. K 

This is an amusing test for the intellectually minded, but it cer- 
tainly can also do what is intended: serve as a measure of ability to 
deal with abstractions, although not: all are on .a high level. One 
must remember, of course, that there are other ways to deal with ab- 
stractions besides thinking in terms of synonyms, antonyms, and 
analogies, but those kinds of classification are the most basic. 



Cooperative Academic Ability Test (AAT). Forms A and B. Prince- 
ton, N.J.: Educational Testing Service, 1963. 



The/1/1 ris comparable iu purpose to \m School and College Ability 
TesisfSCATK \ht Scholastic Aptitude TestfSATf, and thcAmerican 
CouHcil oti Education Psychological Examination f ACE}. It is not an 
intelligence test, but a test that "measures skills in handling certain 
specific kinds of verbal and mathematical material." It is called a 
test of power rather than of speed. (Note: the SCAT manual says 
that 5*C/ir Scries 11 tests, forms lA and IB, were formerly the Co- 
operative Academic Ability Test: SCAT how^ever, exists on four 
levels, AA T on only one.) 

This test consists of two parts, verbal and mathematical with a 
total working time of forty minutes. The verbal part (twenty minutes) 
contains fifty analogies of this type: "tinkle: bells:: A. whistle: tunes 
B. glide: snakes C rustle: leaves D. wrinkle: fabrics." 

Estimate of Validity and Usefvlness 

As the test manual says, "Verbal analogies have long proved a re- 
spectable part of aptitude batteries." They necessarily combine at 
least a modest command of vocabulary with ability to see relation- 
ships. In this test, many of the items consist of simple, tangible 
things iike wnter and paintt but other words are of a difficulty 
approximating that of dok or inimicaL Obviijusly, the student who 
does nov f;now inimical cannot reason about its analogy with some 
other word. The analogies represented in the individual test items all 
appear fair and logical 



80 



TESTS ON THE ENGLISH LANGUAGE 89 



The manual says. "The Cooperative Test Division recommends 
the use of percentile bands in interpreting scores to students. Bands 
suggest the imprecision characteristic of all such test scores (each 
band is two standard errors of management wide) and serve as 
^ guards against such interpretations as this: Joneses score is 152 and 
Smith's is 151, so Jones is better than Smith! If two bands over- 
lap. . . one is jiot justified in concluding that there is any real differ- 
ence in the two standings. If the two bands do not overlap, one is on 
Firmer ground in talking about a difference." Words like these 
should be printed in large boldface type in every test manual, be- 
cause testing is still an inexact science and is likely to remain so. 

The manual reports a validity coefficient of .52 for the verbal test 
of AAT with regard to class rank of 518 students. This statistic 
showi a modera.*lj high positive relationship and suggests that the 
verbal part of AAT may be a pretty good predictor of academic 
achievement 



Cooperative English Tests^ English Expression. Forms lA, IB, 2A, 
2B, and 2C Princeton, NJ.: Educational Testing Service, 1%0. 



The tests are similar in structure but vary in difficulty. Each test has 
a fifteen- minute, thirty^tem Effectiveness section and a twenty-five- 
minute, sixty-item Mechanics section. The levek covered are grades 

The Effectiveness section is largely concerned with vocabularyt 
but instead of the usual pick-a-synonym variety, this test offers a 
sentence with one word left out. The test^taker is asked to fill in from 
a list of four words the one most appropriate to the context. About a 
third of the items* however, offer four variations of the same sen- 
tence; the test'taker chooses the most effective of the four. 

The Mechanics section consists of items like this: 

E When all the marks were 
. F added together, his standing 
G was fourth in the class. 

The test-taker marks G on the answer ^heet to correspond to the line 
containing an error (or O if there is noerror). 



87 



90 J.N. HOOK 



Estimate of Validity and Usefulness 

The Effectiveness section might be improved if there were about an 
even balance between vocabulary and other items, so that certain 
sentence problems not illustrated by this test could be ineluded. 
However, the format of the vocabulary items in this test seems to this 
reviewer much superior to that of th^ usual vocabulary test. A stu- 
dent who chooses the most effective word from a group of four obvi* 
ously not only knows the definition of the word but also recognizes 
its suitability to a given context; further, the student shows that he or 
she can distinguish it from three near^synonyms. In two or three in- 
stances, though, more than a single choice could be defended. 

As is true of most tests on mechanics, this one has some picky or 
even questionable items, similar topmost every day'' and **alright/' 
both of which arc here considered unqualifiedly wrong. In general, 
thoitgh, the items are well chosen; their selection was based upon a 
study of the frequency of student errors. Items in the high school test 
tend to contain elementary illiteracies like**had went'* and "my sis- 
ter she"; items in the college test are definitely moie sophisticated. 
In the items as a group, there is a good balance of spelling, punctua- 
tion, and usage problems. 



Cooperative School and College Ability Tests (SCAT). Series 11: 
Level 1, forms A, Bt and C; Level 2, forms A and B; Level 3, forms A 
and B; and Level 4, forms A and B. Princeton, N.J.: Educational 
Testing Service, 1966. 



5C4r tests are verbal and mathematical, covering grades 4-14. The 
twenty^minute verbal section of each test consists of fifty analogies of 
the form, '*calf: cow:: A, puppy: dog B. nest: bird C. horse: bull D. 
shell: turtle.*' The tests were **designed to provide estimates of basic 
verbal and mathematical ability," and thu^ to serve ''as a measure of 
a student's ability to succeed in future academic work." 

Estimate of Validity and Usejitlness 

Statistics in the manual show that when comparisons are made be- 
tween 5C4T scores and academic performance, the tests "can be 
useful as predictors of academic success. . . , It should be noted, 



88' 



TESTS ON THE ENCLJSH LANGUAGE 9J 



lioweveti that there ii variation in the correlation cocflicients ob- 
tained at various schools, and ^^hen possible schools should conduct 
their own studies on the usefulness of the tests for their purposes." 
In such studies, analysis of scores made by the same students as they 
progress through the grades should be of interest. Validity statistics 
are also offered toshovt the relationships between 5C4 7" and rank in 
graduating class. For the verbal section the coefficient was -SX 

As to the items in the test, the analogies appear uniformly fair and 
apt* Inevitably, of course, a verbal analogy test is to an extent also a 
vocabulary test; thus a fourth grader unfamiliar with words like lug- 
gagt or comprtband ^^ill not be able to recognize analogies in which 
those words are used. But, of course, every test in which the student 
must read something is in part a vocabulary test. 



Differential Aptitude Tests {DAT); Spelling, Language Usage and 
Verbal Reasoning. Forms S and T, George K. Bennett, Harold G. 
Seashore and Alexander G. Wcsman. New York: The Psychological 
Corporation, 1973, 



These are three parts of a much larger battery designed to indicate, 
in a general ^^a>t vocational aptitude of students in grades 842. The 
Spelling test consists of 100 words which are *o be designated as cor- 
rectly or incorrectly spelled. It requires (en minutes. The Language 
Usage test* which requires twenty-five minutes, consists of sixty 
items in this form: 

Ain't we / going to / the office / next week? 
A B C D 

The test-taker is to mark/1 on the answer^heet for this item because 
of the "error'' in that segment. 

Esthmite of Validity and Vsejidness 

As is too often true of spelling tests, this one requires only the recog* 
nition of rightness or wrongness, which is hardly the same as spelling 
the ^vord. After all, recognizing a cauliflower is hardly the same as 
growinga cauliflower. 

In an earlier edition of the usage test, many of the items were ex- 
tremely trivial. This edition is a decided improvement, with nearly 



89 



92 J.N. HOOK 



all of the * wrong" items being of the sort that would be rather offen- 
sive in formal communication. 

Students are told, '*If you do well on this test [Spelling] and on 
Language Usage, as well as on Verbal Reasoning, you should be able 
to do almost any kind of practical writing, provided you have a 
knowledge of your topic and a desire to write about it." True, no 
doubt. But the claim that the usage test 'is among the best genera] 
predictors of course grades in high school and college" makes one 
wonder whether high schools and colleges tend too much to reward 
mastery of superficial form rather than substance. 

Verbal Reasoning Test 

This test consists of fifty items of the form ** is to water 

as eat is to " followed by five pairs of words. The working 

time for this test is tnirty minutes. 

i 

Estimate of Validity and Usejvlness 

The DAT battery is intended chiefly for use in counseling of stu- 
dents. Recommendations can be made concerning general kinds of 
occupations for which the battery suggests they are suited. The Ver- 
bal Reasoning test alone would not be very helpful to cither a coun- 
selor or an English teacher, although the probability would seem to 
be that a student scoring high in it would be successful in college or 
in any kind of work requiring verbal competence and reasoning. Stu- 
dents are told that anyone with a combined rating at the 75th per- 
centile or better in the verbal reasoning and numerical tests ''should 
consider himself capable of peforming well in college courses"; a 
rating above the 50th percentile **also indicates college potential'*; 
but it is '^arguable** whether students in the third quarter should un- 
dertake liberal arts and science programs. 



Essentials of English Tests, Forms A and B* Dora V* Smith and 
Constance M. McCullough (revised 1961 by Carolyne Greene). Cir- 
cle Pines, Minn.: Ameriean Guidance Serviee, 1940 and 1961. 



The test has five parts, with passible raw scores as follows: Spelling, 
25; Grammatieal Usage, 44; Word Usage, 15; Sentence 5trueture, 
20; and Punctuation and Capitalization, 53* The Spelling test re- 



id 

ERIC 



90 



TESTS ON THE ENGLISH LANGUAGE 93 



quires the te^t-taker todccide^lhethc^a Mord is or is not spelled cor- 
rectly and to rewrite it if incorrect The Grammatical Usage test pre- 
sents questioned items in paragraph context; most of the items are 
verbs and*pronouns* The test-taker writes the '"correct" form of each 
item he or she considers wrong. In Word Usage^ the student hunts 
out and corrects expressions like o//'o/and party (for person)* The 
Sentence Structure test asks the student to select from a group of 
four sentences the one that "most correctly and effectively states the 
idea/* The Punctuation^Capitalization test requires the student to 
insert punctuation marks or capital letters where needed in two pas- 
sages, one of which is a letter. The total working time for the test is 
forty-five minutes* 

Estimate of Validity and Usefabiess 

This is an unpretentious test* The Mauual of DircQtiom consists of 
six pages, which cover si:ccinctly what some testmakers include in 
two or three booklets of twenty or thirty pages each* **The authors/' 
says the manual, "are more concerned that teaehers interest them* 
selves in the performance of individual pupils than in any group 
comparisons/* Hence, a Diagnostic Key to Error is provided to 
guide the teacher in deciding what points to stress with the whole 
elass» and in suggesting 'Mhe manner in which individual pupils 
should be grouped in order to accommodate individual needs/' This 
emphasis is laudable* It for instance, no children in a class have dif- 
ficulty with sueh verbs as come and why should class time be 
wasted on instruction in the usage of those verbs? 

One may quibble with some of the test items. For instance, an eat- 
ing place is named *'the dog in the bun"; the student is to write this 
as *Mhe Dog-in-the-Bun/* and loses a point if he or she leaves out the 
hyphens. Are the hyphens really needed^ and if they are* is a com- 
mercial establishment likely to use them? (I recall no hyphens in 
Chock Full o' Nuts*) Thcwhat in 'i didn't know but what he would 
refuse** should be changed to that according to the test, but Bet^en 
and Cornelia Evans, in A Dictionary of Contemporary American 
Usa^e, say that 'Wm knom but wliat it's all true, is acceptable En- 
glish in the United States*** Multiple answers are also defensible for 
a few of the items in the Sentence Structure segment. And the alloca- 
tion of over a third of the items to punctuation and capitalization 
seems unreasonably high, 

In general, despite a f^w such quibbles, these simple, uncluttered 
tests are at kast as good as others whose publishers have developed 
much ttjore paraphernalia, 



ERIC 



91 



94 LN. HOOK 



Fundamental Achtcvcmcnl Scries (FAS)« Verbal Form B. George 
K. Bennett and Jerome E. Doppelt New York: The Psychological 
Corporation^ 1969- 

ThQ FAS tests are advertised as '^culture-relevant/' '*fair to the dis** 
advantaged/* "based on everyday experiences that simulate real life 
situations and demands. Can the worker tell which bus wiU take him 
to work? . . . Does he understand commonly used words?" The 
tests are administered orally by means of tape- recordings '*to enable 
those with limited reading skills to demonstrate their true abilities/* 
Many "easy*' questions were **deliberately included/* Form A is sold 
only to personnel departments for testing of applicants and employ** 
ees, but form B is also available to educators. Besides the Verbal 
tests, a Numerical test is included in the battery. 

The Verbal test includes reading of rather ordinary signs and di- 
rections (e.g., "Shake well before using''), reading a menu, flnding 
information in an apartment house list or a telephone book list, 
copying some very short sentences, answering questions about some 
short oral announcements, recognizing twenty-five correct or incor* 
rect spellings of about the difficulty of machine, answering some 
simple questions concerning sets of four pictures, and flnding in a 
multiple-choice test the best synonym for each of twenty-four words 
of about the difficulty of economical The whole Verbal te^t takes 
about thirty minutes and consists of 100 items. 

Estimate of Validity and Usefulness 

This test is probably of greater use to an employer than to a school, 
which would have other means of discovering, for instance, whether 
a student can read a sign or copy a sentence. In general, the test 
seems well conceived, although the spelling section is subject to crit*^ 
icism. It represents a fourth of the whole test — a disproportionately 
large share — and, like so many other spelling tests, involves only rec- 
ognizing rather than spelling. 



High School Placement Test. Series 71E. Chicago: Science Re- 
search Associates, 1968. 

The tests in this battery are Educational Ability (which includes 
vocabulary and verbal analogy items, plus arithmetic), Readiug- 



ERLC 



92 



TESTSONTHEENGUSHUNGUAGE 95 



Language Arts Achievement, Arithmetic and Modern Mathematics 
Achievement^ Social Studies Achievement, and Science Methodol- 
ogy- Purposes of the battery^ which is intended for second-semester 
eighth graders and ftrst^semester ninth graders^ are to assist in plac- 
ing students in appropriate currlculums, aid in ability grouping, and 
identify gifted or remedial students. 

The Reading-Language Arts Achievement test consists of eighty- 
five items taking fifty minutes, divided about evenly between reading 
and language arts. The latter items ''are designed to test skills in the 
use of the English language. The student must choose the alterna* 
tives that represent correct capitalization, punctuation, and spellingi 
the best grammatical usage, and effective expression. The student's 
actual use of the language, rather than his ability to memorize rules 
or definitions, is measured/* 

The test offers four selections of a few hundred words each* Cer- 
tain words are underlined and the student is to indicate whether 
each underlined word or group of words is correct or in need of one 
of three suggested revisions. Then come questions testing how well 
the student read the passage; some of the reading questions are mul- 
tiple*choice vocabulary items concerning words in the passage. 

Estimate oj Validity and Usefulness 

Tests of usage, punctuation, and so on are of three basic types: those 
that ask the student to recognize whether an item U "right'* or 
''wrong"; those that ask the student, after making such a decision, 
to choose the best correction for each "wrong" item; and those that 
require the student to do rather than just to recognize. Because of 
problems of scoring, there are very few tests of the third kind. This 
SRA test is one of the second kind, but it does not truly measure 
"the student's actual use of the language," despite the claim quoted 
above. 

The reading selections on which the test items are based are 
moderately difficult for junior high ages; a fairly high percentage of 
students will be almost completely bahled by them. The language 
items themselves are a mixture of the reasonably significant and the 
trivial; occasionally, an item was obviously concocted by a teacher or 
an editor who didn't realize that no child is likely to write such a 
thing, e.g., ''Some merely of these seulptors design their works. . * ." 
Alsoi some of the synonyms to be chosen as "correct" are rather off 
the mark. Despite such flaws, the publisher reports high correla* 
tions, averaging about *60 between test scores and course grades in 
two sehools. 



93 



% J.N. HOOK 



Hoyum-Sanders Junior High School English Test. Tq^\ I, forms A 
and B; and Test U, forms A and B. Vera D. Hoyum and M. W. San- 
ders. Emporia, Kans.: Bureau of Educational Measurements, 1964* 



Each form consists of 135 items and requires forty minutes for an- 
swering. The divisions are as follows: Part I, sentenee structure 
(what part of a sentenee, if any, contains an error?), ten items; Part 
li, capitalization, fifteen items; Part III, punctuation, twenty-five 
items; Part IV. contractions, possessives, and spelling, fifteen items; 
Part V, grammarVnd' usage (recognizing crror:v* choosing the correct 
word* and choosing the explanation of why it us correct), sixty items; 
and Part VL alphabetization, ten items. 

Estimate of Validity ami Usefitlitess 

All tests, probably, have to be contrived, but this one seems more so 
than most. What junior high student would debate whether to say 
'*had flown" or "had flied"? Many might say ''had flew," but that 
option isn't given. What junior high student would write ''had 
slidcd'' or "The balloon rised"? How realistic for junior high 
school is the choice '*[!. Whoever 2. Whomever] Charles challenges, 
he defeats"? And what junior high student in *'real life," as dis- 
tinct from life conceived by testmakers, would ever have to wonder 
about which part of this group of words, if any, contains an error? 
"(1) How quickly change (2) from one (3) form (4) toanother!" 
Tliis test hardly represents the apex of the art. 



Iowa Placement Examination^ English Training. Scries ET-2, form 
M. M. F. Carpenter, G. D. Stoddard, and L. W. Miller (revised by 
M. F. Carpenter and D. B. Stuit). Iowa City: Bureau of Educational 
Research and Service, 1941 and 1944. 



This forty-fivc-minute test consists of three parts. Part 1# Spelling, 
con:>ists of seventy- five words in four versions for each; the student 
selects the correct spelling. Part 2, Punctuation, has seventy-five 
rightly <Sr wrongly punctuated sentences. Part 3, English Usage, of- 
fers sev^nty*five sentences that may or may not contain errors in 
usage. 



ERJC * 91 



TESTS ON THE ENGLISH LANGUAGE 97 



Estimate of Validity atul Uselubtess 

Thii old test is no better and no ^^o^se than many newer ones* The 
wrong spellinjjs :>ecm more lifelike than some of those conjured up 
for other tests. The principles of punctuation that are represented 
are unnecessarily repetitious. The English usage section has its share 
of almost impossible sentences like "I am not one who they could 
not interest in the fate of this gallant little band of heroes" and 
'*Why have you lain idle all day?" Sometimes it's better to lie idle 
than to take a test like this. 



Iowa Tests of Basle Skills. Levels 13 and 14, form 6. A. N. Hierony^ 
mus and E. F. Lindquist, Boston: Houghton Mifflin Co.> 1971. 

The complete battery eonsists of eight levels (numbered 7*14) for 
grades 1.7-2.5 through 8-9, covering vocabulary, reading compre- 
hension, language skills, ^\ork-stud>' skills, and mathematics skills. 
Only the vocabulary and language skills tests for Levels 13 and 14 
(grades 7 and 8-9) were examined. 

Items overlap for the various grade levels. For instance, some of 
the Level 11 and 12 items carry over into 13, and some of the Level 
12 and 13 items carry over into 14. In the vocabulary test* the forty- 
eight items require seventeen minutes. Each item is multiple choice, 
In the Ibllovlng form: 

Close the door 

(Dshut {2)hold (3) behind (4)open 
The language skills tests are as follows: 

Spelling; forty-eight items, twelve minutes, in the form *'{1) our 

(2) mi {3) your (4) them (5) No mistakes." 
Capitalization: forty-three items. Fifteen minutes, in the form "(i) 

Tom and jerry {2) picked up all the {3) trash from the picnic. 

(4) No mistakes." 
Punctuatioti: forty-three items* twenty minutes, in the form "(I) 

We all fasten {2) our seat belts (3) before. \ve leare. {4) No 

mistakes/* 

Usage; thirty-two items* twenty minutes, in the form **{l)He 
showed us the way* (2) Are you afraid to try? (3) Me and him 
took turns. (4) No mistakes.** 



95 



% J.N. HOOK 



Estimate of Validity ami Usitfuluess 

As tlie title /onti Tests of Ba^^ic Skills indieatcb, tlicbc iirejusl lesls of 
skills — only ineidenlally or eoineidenlally tests of knowledge, rea- 
soning; ability, or intelligenee. An ideal test of skills vtoutd require 
the student to perform acts showing mastery of each skill; these may 
be I'navoidably substitute recognition of other people's mastery or 
laek of it. For example, tlie student is not required to punctuate but 
only to determine whether someone else's sentences are eorreetly 
punetuated. 

The employment of overlapping levels seems wise. A seventh 
grader, for instanee, eneonnters test items both a little easier and a 
little harder than most seventh graders ean eope witli, so that his or 
her individual level ofskills ean be pretty aeeiirately determined. 

Mmost without exeeption the individual test items appear suitable 
lor the junior high grades. In voeabulary, for instanee, sueh students 
should be able to define words like corridor or tolerate. In spelling, 
the misspellings represented are of th^ sort that students often 
make: dcciwe and kuowkge, for example. In eapitalization and 
punetnation, items ni^y inetude unneeessary capitals or marks as 
well as ineorreetly used ones. The usage items emphasize pronouns 
and verbs (whieh eause students most problems), but pay some at- 
tention to adjeetives and adierbs, double negatives, redundaneies, 
eonfusion of homonyms, and miscellaneous word forms sueh as 
slteeps (orsheep. 

Fewer than usnal items are subjeet to criticism. In punetuation, 
though, there is an oeeasioual item in whieh correetions in two Knes^ 
instead of the speeilled one, eould be justified. In capitalization, sev- 
eral items are of the Casimm seu type, in whieh the test-takers are to 
indieate that Jea is wrongs however^ respectable journalistiepraetiee 
would retain the small letter. 

The Teacher'^ Gaidc suggests three possible plans for administer- 
ing the tests: graded testing (e.g.. all seventh graders take Level 13); 
out-of-level testing (giving a lower-level test to a slow group, a high- 
er-level test to an aeademieally superior group); individualized test- 
ing (giving eaeh student theleiel of test that seems must appropriate 
for him or her). Although graded testing alTords eomparability of 
seores, the other ehoiees allow for eonsiderable flexibility. 

The Teacher's Giddc also offers suggestions for helping elasses or 
individual students to improve in their areas of weakness. At least 
one of the suggestions eotieerning usage instruetion is highly ques- 
tionable: '*Call the attention (orally, sinee sound is important in 
usage) of the pupils to their errors. Contrast the eorreet form with 



ERIC 



90 



TESTS ON THE ENGUSHtANGUAGE 99 



the one to be avoided/* Following thts suggestion might lead to fre- 
quent interruption of students, to interference with the Row of their 
thoughts, to emphasis upon ''correct ness'* rather than content, and 
to reinforcement of the stereotype that English teachers* chief inter- 
est is in catching people making mistakes. 



Iowa Tests of Educational Development flTED)^ SRA Assessment 
Survey. Forms X-5 and Y-5. E. R Undquist and Leonard Feldt. 
Chicago: Science Research Associates, 1970. 



The full ITED battcrj consists of tests tn reading, language arts, 
mathematics, social studies, science, and use of sources. The tests 
arc **dcsigned to measure achievement in basic curriculum areas 
taught in grades 9-12 today. Tests require students to think crh- 
ically* analyse written and illustrative materials, recognize state- 
ments of fundamental concepts, and select appropriate examples 
and applications of concepts, Recall of isolated inforuiation is given 
little emphasis/' 

The language arts test consists of flfty-four items on usage (broad- 
ly defined) and forty on spelling. The directions for the usage test 
serve to describe it; 

The passages that follow might have been written by high school 
students, In the first two passages certain parts arc underlined 
and numbered, In the right-hand colun:;n there arc several choices 
with the same number as the underlined part. You are to choose 
the version that best expresses the idea, makes the statement 
grammatically correct or most precise, or is worded most consis- 
tently with the style and tone of the passage. Some items **ivolvc 
more than one kind of error For example, you may fino both 
grammatical and capitalisation errors in the same item. In some 
cases the problem is not to correct a specific error, but to decide 
which phrase is most appropriafc> considering the situation as a 
whole. 

The final twelve items deal mainly with paragraph structure, 

The spelling test consists of groups of four words, one of which 
may be spelled incorrectly. The student is asked to select the mis- 
spelling or to indicate that none is wrong. 



100 J.N. HOOK 



The total language arts test requires forty minutes testing time* 
The same test is intended for all grades, 9 through 12; that is, there 
is not a separate test for eaeh grade. The existeiiee of two forms 
makes possible a eompanson of seores at whatever time interval is 
desired. Only form X-5 was examined for this review. 

Estimate of Vididity and Usefulness 

The kinds of items eovered in the usage test are summarized as fol- 
lows in ihcHandbook for Teachers and Examiners: 



FormX-5 Form Y-5 

Capiialization andpiiiietiiation 10 10 

Verbs, adverbs, nouns 9 1 1 

Senieneestriieture 9 9 

Appropriateness of writing 7 8 

Coneiseness and elarity 12 11 

Organization and development 7 5 



These figures suggest that if an examiner ^vants a test with major 
emphasis on eapitali/ation, pnnetiiation, and use of eonventiona) 
verbs and pronouns^ he or she should look elsewhere, sinee other 
tests under review have substantially more items on these topies* But 
if the examiner wants a broader, though shallower, view that in- 
eliides the last four features in the list above, he or she should eon- 
sidcr this test seriously. Sinee those four faetors areslgnifieant, espe- 
cially for their applieability to writing, the makers of this survey test 
were wise to tnelude them, 

" In many situations [in this test] the primary issue \% one of style 
and llueney rather than elear*eut error/* says the Handbook* Good. 
In aetual student writing sueh Haws usually outnumber the definite 
''errors** that textbooks tend toeoneentrate on^ but few tests tend to 
pay miieh attention to matters of style and fliieiiey. 

''An attempt has been made to avoid usage and praeiiee on whieh 
there is subsiantial disagreement between language authorities.*' 
Good again. Other revtej\s in this book show that not all testmakers 
follow this prineiple eonsistently. 

"The test • . . does not eover many elementary skills previously 
mastered by almost all high sehool students." If, then, some of your 
students say 'Mhey was," this test won't show yon that, 'The test 
must be eonstdered primarily a siirve> test, and elues or indieators of 
partieular student weaknesses must be eorroborated by earefid fol- 
low-up investigation/* 



98 



TESTS ON THE ENCUSH LANGUAGE 101 



*The present test is to be considered a complement to, not a sub* 
stitute for. evaluations of student compositions/' Some testmakers 
seem to imply that a usage test can tell a teacher how well each of his 
or her students writes. Not so, of course, since writing consists of 
much more than usage choices. As Lindquist and Feldt recognize, 
even a relatively broad test like this can reflect only a few of the 
characteristics of a student's ability to use the language. 

The spelling test is not unusual in format. The words included are 
among those that most often are found in lists of common misspell- 
ings. 

Like most other tests, the ITED stress the ability to recognize 
rather than the ability to do. But the^e tests have fewer flaws than 
the majority. 



Kansas luitior High Sehaol Spelling Test Test I, fotms A and B, 
and Test II, forms A and B. Mary 7. Williams, M. Sanders, 
Connie Moritz, and Alice Robinson. Emporia, Kans>: Bureau of 
Educational Measurements, 1964. 



Each form coders eighty-five words and requires fifteen minutes to 
complete. Four spellings of each of the words are given; the student 
selects the correct one. Words included were selected with the aid of 
the Buckingham Externum of the Ayres Spcllmg Scale, the Iowa 
Spelling Scale, the Thorndike word list* and **a number of recog- 
nised spelling texts.*' 

E.^tittuue of Validity and Usefulness 

The words appear about right for junior high level, but some of the 
misspellings appear farfetehed; misspellings actually written by a 
number of studctUG should be used rather than apparently dreamed- 
up misspellings tike rcalK racial), triangl phresh, and rhbarbbfrhtr 
barb). Probably, though, misspellings should not be used at all in a 
spelling test, because of their passibly being remembered; after 
looking at 1,360 misspellings in the four forms of this test, this re- 
viewer had diftieulty in spelling the word errors. (Is it crrorz, or 
erors^ or airrirs, att Ilstv^! among the possibilities?) it is diftkuU to 
argue (arga. arega, rgue) that test designers cannot find some better 



99 



102 J.N. HOOK 



measure {mavi^un-. meii:^huri\ ma:^ttre) — one thai would require 
actual spelling of words rather than just {\) rcvoitiiwn (2) recogniion 
(3) recogmshmt (4) recogniiiott. 



Illinois Tests In the Teaching of English^ Knowledge of Language. 

Compelency lest A. William H. Evans and Paul H. Jacobs. Carbon- 
ciale, III.: Southern Illinois University Press, 1972. (See Purves re- 
view p. 130 for Knowledge of Lilerature test.) 



This Icsl differs from Ihe olhers under review in ihat it is intended 
only for icachcrs or for college students preparing lo become teach- 
ers. Other tests in Ihc battery are Attitude and Knowledge in Writ- 
ten Composition, Knowledge of Literature, and Knowledge of the 
Teaching of English. The tests were developed noncommercially, 
under a federal grant, as one segment of a statewide effort in Illinois 
to upgrade preparation of teachers of English. Experimental edi- 
tions of the tests were Held tested in various colleges and universities, 
and "fifty nationally known experts/* listed in the TestAdminisira- 
lorsManuaL olTcred critical analyses. 

The language lest consists of eighty-four nmltiple-choice items. 
The items pertain to "statements and terms used to deseribe how 
language ttmctions* ... the principles of semantics, . . , three sys- 
tems of English grammar. . . . history of the English language, in- 
cluding its phonological, morphological and syntactic changes, and 
concepts about levels of usage and dialectology, including the cul- 
tural implications of both." The test is not timed, although the time 
usually required is forty-Hvc to sixty minutes. 

Esiintaie of Validity amt Uscfuhtess 

Some of the Items involve rather philosophical points, such as a 
satisfactory definition of language or criteria for the lalue or worth 
of a language. Others probe the teacher's awareness of the most 
important tasks of the semantlcist. the lexicographer, or some other 
variety of language scholar. Historical questions concern matten 
like lexicography and spelling, not just phonology, morphology, and 
syntax. The questions about dialects tend not toward specific points, 
siich as variant pronunciations or lexical items, biit toward an 
assessment of the teacher's attitudes toward dialect and his or her 



ERIC 



I'uo 



TESTSONTHEENGLISHLANGUAGE 103 



understanding of principles of diaiectolog>. In its grammatical ques 
tions the test assumes that the test-taker is acquainted with all three 
of the present major descriptions of the language; it asks about 
characteristics of each and offers items that require the test-taker to 
apply certain principles of each of the grammars. 

In other words, this test is totally unlike a test for students, which 
conventionally gets only into matters of "right" vs. "wrong" or— 
especially in the past— into grammatical identification of subjects, 
predicates, etc. This test for teachers probes much deeper and mea- 
sures the teacher's familiarity with underlying philosophies of lan- 
guage and competing descriptions of it. This is as it should be, for 
the teacher who knows only such things as the rules foriva^ and were 
and the superficial distinctions between a phrase and a clause will 
inevitably be shallow in the teaching of the language. 

The test is not an easy one, nor should it be, since language has so 
many ramifications and complexities. In a preliminary seventy* 
eight'item version, no prospective teacher of English among 245 an- 
swered more than 62 correctly; the low score was 25. Inevitably, in a 
test that is simultaneously as broad in coverage and as deep as this, 
scores will not be high. But a teacher who can score 60 or so will 
assuredly be one ^\hose teaching of the language can be far superior 
to that of the teacher whose knowledge is quite superficial — other 
things being equal. 

ThcHanilbook recommends five possible uses for the tests in this 
battery: (1) in preservice education, as an aid to academic and pro- 
fessional advising; (2) during early training in the field (student 
teaching or the fmst year of internship or full-time teaching); (3) for 
self- assessment (which teachers typically do too little of); (4) for 
assessment of applicants for teaching positions; and (5) during or 
after inservice workshops. 

If all teachers of English during at least one of these stages were to 
be measured by this test and others in the battery, it seems likely 
that within a few years the level of preparation would begin to rise 
dramatically. 



Content Evaluation Sertes Language Arts TesCs« Language Abiltfy 
Test. Form 1* Ellen Graser (Kellogg W. Hunt, seriei editorial advis- 
er). Boston; Houghton Mifflin Co., 1969. (See Braddock review p. 
122 for Composition test and Purves review p. 133 for Literature 
test.) 



ERIC 



Jfll 



104 J, N, HOOK 



Intended for grades 7*9, Ihe 'Xattguage Ability Test seeks to assess 
(1) the student's grasp of important principles underlying the con- 
struction of the English sentence and (2} the student's ability to use 
sentence elements in standard sentence patterns. To realize those 
objectives, the test concentrates on the distinctive areas which re- 
search has shown to be of greatest importance in language develop* 
ment — namely, sentence structure, word form and function^ 
mechanics, and diction/' Working time for the fifty-eigfit items is 
forty minutes. The test "does not try to measure how well a student 
can deflno and classify the elements of language, nor try to measure 
how well he can use technical terms to describe the functions of such 
elements/' Nor does it insist that the only choices that can ever be 
made in a language test are "right" and **wrong/' 

As a result of these intentions, the author has put together a test 
thai is tttuisual and unusually hard to describe. The items are of per* 
haps ado7en different kinds, of which fewer than half are eoncerned 
with the conventions of punctuation, eapitalizalion, spelling, and 
usage. The others attempt to probe students' understanding of how 
sentences function, through asking them to think about which words 
might be substituted for which; whieh sentences in ordinary lan- 
guage follow the same patterns as setitences with nonsense words; 
what reply (in standard English) would be suitable to a question like 
"When will the roft' be kunkeled?" or ^ eommand like "Never 
raddel the crompums'*; and which standard English sentence is put 
together in the same way as another standard English sentence on an 
entirely different subject. 

Estimate of Validity and Usefulness 

The teacher w ho wants to know only how well his or her students can 
spell or punctuate or identify complex sentences will not like this 
test, for it won't do that. But the teacher who is especially interested 
in how well his or her students really have a working understanding 
of sentence construction will find it enlightening. And both teaehers, 
by studying this test and the manual's discussion of its contents, may * 
learn things that will benefit their teaching. 



McGraw*Hin Basle Skills System Spdtlng and Vocabulai? Tests. 

Form A. Alto L. Raygof. Monterey, Calif.: McGraw-Hill Book 
Co.. 1970. (See Braddock review p. 123 for Writing test.) 



102 



TESTS ON THE ENGLISH LANGUAGE lOS 



In each of the ttft> items in this twenty^minute test tor grades 10-13, 
the student eneounters four different underlined words in sentenee 
context. One of the four words may be misspelled; the other three, 
though spelled eorreetly, are taken from lists of frequently mis- 
spelled words. 

Estimate of Validity atid Usefulness 

Anyone who prepares a published test and manual should be metie* 
ulous. In explaining that the words in this test were ehosen on the 
basis of an extensive study* the author of the manual states, *'Dr 
Thomas Pdlaek repor^.-'d this study in \\\^ Journal of College En* 
glish It shouldn't be Pollack] it's Pollock—a distinguished pro- 
fessor, university administrator, and past president of NCTE whose 
numc an>one writing about spelling should be able to spell eorreetly 
(the name is misspelled at least five times in the manual, and eor- 
reetly onee). And it shouldn't Journal of College English; iVs Col- 
lege English. When the author refers so carelessly or sloppily to his 
basic source, one wonders how mueh trust ean be plaeed in his test. 

Actually, it has some good qualities, even though, as usual, it's a 
spelling test that doesn't require the student to spell Giving the 
words in eontext makes possible the differentiation of words like 
penonui personneL Also, it's probably better to have only one mis- 
spelling to three correet spellings rather than the other way around. 
The distraetor words, too, as noted above, are themselves words 
often misspelled. 

The test has two six-minute seetions. Seetion t eonsists of thirty 
words ehosen from beginning college textbooks in various fields; 
four possible synonyms are given for eaeh. Seetion 2 has twenty-five 
'^artifieial words, created from parts whose meanings are well estab- 
lished/' For example: 

pyrjphile: 1. a blaeksmith's tool 2. a builder of pyramids 
3. one who loves fire 4. one who tells lies 

Estimate of Validity and Usefulness 

Seetion 1 is ordinary in eoneept and exeeution; the words seem 
appropriate for grades 10-13 and are drawn from both !he seienees 
and the humanities <ind arts. Seetion 2, however, offer: a praise- 
worthy and interesting departure from the ordinary. The student 
who has a good vocabulary and who has thought mueh about words 
will associate "p>rophile," for instajit;et with words like pyrwnania 



103 



106 J. R HOOK 



and hiblivfthili:, :ind come up \\\ih the desired ans^^er, **one who 
loves fire/* Thus this part of the test should be an excellent indieator 
of voeabulary strength. 



Minnesota High Sehool Aehievenient Examinations, Language Arts. 
Tests 1-6, form EH. V. L. Lohmann, editor. Cirele Pines, Minn,: 
Amerieaii GuidaiieeServiee, 1974. 



The total batterj consists of tv\ent>-stx aehievenient tests for junior 
and senior high sehools. There are separate Language Arts tests for 
eaeli grade, 7 throngh 12, The manual elainis that **the questions 
selected for the test rcfieet the ever-ehanging Minnesota courses of 

study **Thus the tests are geared speeifieally to Minnesota users, 

but are *tlsu ^'generally applicable, " according to the manual, to 
other states. 

Classes uf items varj somewhat from grade to grade. Thus grade 8 
has items on spelling (fifteen); \oeabulary (tv^eiity); kinds of sen- 
tences (i.e., declarative, ete,, nineteen); capita I izat ion and puiietua- 
tion (t\^elvt), grammatical usage (finding errors in, ten); usage of 
words (ten), faulty expresslun (five); verb tense (ten); kinds of sen- 
tences (i.e., simple, etc., nine); grammatical terms (eg., completing 
a definitiun of a grammatical term, fifteen); and literature (twenty- 
five). Grade 12 repeats several of these* though In somewhat varied 
fashion, but ;Uso has entries on library skills (mostly indicating 
sources of Information, fifteen) and composition (e.g., finding the 
**besi'* sentence, twenty-eight). 

EstUmite of Validity mid Usefulness 

These tests are poor in eoaception and in execution. Certainly lan- 
guage arts — especially in Minnesota, where Dora V. Smith and 
Harold Allen ha\e labored w diligently — devotes less attention than 
these tests suggest to sentence classification, tense identification, 
and surting out parts of speech. Certainly Minnesota teachers would 
not uniformly agree that real good time'* is definitely an error in 
usage. And how Importan. is it for an eighth-grader to know leha- 
btxl Crane's occupation? These tests are revised rather frequently. 
It is liopcd that future editions will be better. 



104 



TESTSONTHEENGLISHLANGUAGE 107 



Missouri Coll«g« English Ttnl Form A, Robert Callts and Wil- 
loughby Johnson* New York; Harcourt, Brace & World, 1964* 

Only form A was examined for this report. It is **for general college 
use and for use with high sehool seniors/* Form B "is reserved for 
use in eolleges and universities exelusively," Form C is for *'situa- 
tions demanding a ^seeure' tesL" 

The test eonsists of ninety items and requires forty minutes. Two- 
thirds of the items involve spotting errors in paragraph eontext, in 
punetuatton,eapitatization, ''grammar/* and spelling. Ten items re- 
quire students to find thy best sentenee In a group of four. The re- 
maining twenty items are sentenees in four serambled paragr£:phs 
that the student is to place in the best possible orden 

Estimate of Validity and Usejithtess 

Despite its emphasis on location of errors, this is one of the best 
available tests for high school upperelassmen or college freshmen. 
The passages in whieh the '^errors" are embedded appear more 
realistic, less concocted than most; some are indeed doctored ver- 
sions of student writing. The **find-the- best-sentence" items are in- 
teresting and challenging; again the poor sentences seem realistie. 
Ability to rearrange the scrambled paragraphs is a fine measure of a 
student's understanding of prlneiples of organization. 

During development of tlie test, whenever it was found ''that the 
large majority of beginning college freshmen had already mastered 
the ktiowledgv or skill being measured by a particular item, that 
item was omitted even though it was logically a part of the do- 
main, , , , Thus the test comprises those items considered by com- 
petent judges to be valid measures ofspeeified skills and abilities not 
yet fully mastered b> the majority of beginning eollege freshmen,*' 
The list of punetuation and ''grammar" items that survived this 
screening is informative, punctuation between independent clauses, 
with parenthetical, restrictive, and nonrestrictive elements, and to 
show possession; verbs (agreement, tense, principal parts); pronouns 
(case, relative, reference); adverbs distinguished from adjeetives; 
andspt^ial cases, 

A truly superior test, of course, would measure the student's abil- 
ity to do rather than to recognize. It also might iucltide some kinds 
of items (e,g,, vocabtilaty) that this test docs not. But until the supe^ 
rior test comt^s along, this ont; is a pretty good substitute! 




108 J.N. HOOK 



Sandcrt'Fktchcr Spelling Tttt Test I, forms A and B; and Test li, 
forms A and B, Gwen Fletcher and M. W. Sanders. Emporia, 
Kans.: Bureau of Educational Measurements, 1964, 



Each form, to be used in grades 9-13, covers 150 words. The first 
ninety are tn a list, with about half of them misspelled; the student 
IS to decide which ones. Part H presents twenty-five pairs of words* 
like stationary and stationery^ in sentences; the student chooses the 
one required by the sentence. In Part III* the student faces groups of 
five different words (e.g., ''clique* detergent, predjudice, trek, debt- 
or \ with the one misspelled to be selected. 

Estimate of Validity and Usefulness 

Some of the misspellings appear unlikely, e.g., prohesied. However, 
more serious ts the fact that some spellings considered ^rong are 
recognized as alternative spellings in Webster^s Third, propellor, 
liquyy. 2indpaycd{M least in the sense of "payed out the rope"). 

The trouble with most spelling tests, including this one, is that 
they measure something other than spelling ability. They measure 
the ability to recognize a spelling as correct or incorrect; this is by ao 
means the same thing as the ability to write a word correctly. 

Another weakness of n^^iny spelling tests, including this one, is 
that they do not diagnose at all the kinds of spelling troubles that a 
given student has. The test authors admit as much in the manual, 
where they say, not very helpfully, "Should it be found that some 
students have difficulties which cannot be readily located by us^ of 
this test, several diagnostic tests should be obtained or constructed 
and administered. After the specific weaknesses and handicaps are 
located, remedial measures may be applied intelligently.'* So what's 
the use of giving this test? 



Sandcn-Fktchcr Vocabulary Test Test I, forms A and B; and Test 
11, forms A and B. Gwen Fletcher and M. W* Sanders. Emporia, 
Kans.: Bureau of Educational Measurements, 1964. 



Constructed for grades 9-13, each of the four forms requires forty 

106 



TESTSONTHEENGLESHLANGUAGE 109 



minutes for answering and consists of 100 items. Seventy-five of the 
items are multiple chotcep asking students to choose the best defini- 
tion among five possibilities; twenty-flve items require only a plus or 
a minus to indicate proper or improper use of a word. Words in the 
test were chosen from the Pressey lists of basic vocabulary In a num- 
ber of high school subjects and from "other supplementary lists/' 
Words were checked against the Thorndike word lists in attempt 
to make the various forms comparable in difficulty. 

Estimate of Validity and Usefulness 

It is difficult to defend a few of the word choices. How important is 
it, for instance, that a student know dudgeon, gimp, counterpane, 
marzipan, snickersnee, veldt, and Caledonia? Some such words 
certainly deserve inclusion to determine the vocabulary level of the 
occasional widely.read student. Most of the words are of the ap- 
proximate levels ofdexterous and mundane. 

The authors say that test results may be used for determining 
pupil achievement; (2) for checking the efficiency of instruction; 0) 
for analyzing pupil and class weakness; and (4) for motivating pupil 
effort,'* They add that if a student's paper is studied, his or her 
weaknesses may be found and "remedial measures may be applied 
intelligently," It is difficult to agree with these statements unless the 
class has been engaged specifically in vocabulary stady that happens 
to have included the words in these tests. And if a student's test 
shows that he or she do<2sn*t know marzipan and counterpane and 
the like Just what '^remedial measures may be applied intelligently" 
except to encourage him or her — like every other student— to read 
mure books, see more places, do more kinds of work, play a greater 
variety of games, have more experiences? 

Essentially the same claims of usefulness, incidentally* are used 
for other tests in this series, generally in identical words. Yet a 
vocabulary test should havt somewhat different uses than, say, a test 
of sentence structure, usage, or punctuation. 



Coopermttve Sequeotlii) TeitQ of Educatlotul Frogieis (STEP), 
Lttlenlng. Forms lA. IB, 2A, 2B, 3A, 38, 4A« and 4B, Princeton, 
N,L; Educational Testing Service* 1957, (See Braddock njview p. 
1 25 for Writing test,) 



ERIC 



107 



no LN.HOOK 



Other tests in this battery covering grades 4*14 include treading, 
writing* essay, social studies, science* and mathematics. Each 
Listening test consists of two thirtyTive^minute segments, with 
thirty six or forty items in each segment. The administrator reads a 
short passage aloud, typically about two minutes for each of twelve 
passages. The student's test booklet provides several multiple* choice 
questions on the passage. 

The test, says the manual, "Measures ability, through listening to 
passages read by the teacher or test administrator* to comprehend 
main ideas and remember sigmFicant details, to understand the im- 
plications of the ideas and details, and to evaluate and apply the 
material presented. Materials include directions and simple ex- 
planation, exposition, narration, argument and persuasion, and 
aesthetic material (both poetry and prose).** 

Estimate of Validity and Usefithtess 

The selections for student listening are likely to be of at least reason- 
able interest to most students, and they vary greatly in style and con- 
tent. None requires much more than three minutes of listening time. 
(If a teacher wants to measure the ability to listen to and retain in* 
formation about a considerably Icnger selection, the Brown-Carlsat 
Listening Comprehension Test^ discussed earlier, should be con- 
sidered.) The questions are intelligent and not excessively picky. 
They do measure, as the quotation from th.^ manual states, much 
more than the recollection of detail. A recording of the passages to 
be listened to might be preferable to reading by the teacher, espe- 
cially if test results are to be compared with norms or with results in 
classes of other teachers. 



Stanford Achievement Test (SAT). Primary Level II, Intermediate 
Level I, Advanced, and Test of Academic Skills; form A. Richard 
Madden, Eric F. Gardner, Herbert C. Rudman, Bjorn Karlben> and 
Jack C Merwin. New York: Harcourt Brace Jovanovich, 1973. (See 
Purves review p* 128 for 54 T High School Arts and Humanities test.) 



In all, the5,4r cavers six levels. Not reviewed here are Primary Level 
[f mainly for grades 1.6*2.5; Primary Level III, mainly for grades 

1 0^8- 



TESTSONTHEENGLISHLANGUAGE 111 



3.64.5; and Intermediate Level H, mainly for grades 5.6<6.9. There 
is also a Stanford Test of Academic Skills ("Stanford TASK") in- 
tended for grades 9*13 and not included in this review. 

Coverage of test batteries varies, but in general a battery includes 
vocabular .^ading comprehension, word study skills, spilling, lan- 
guage, matiir^matics, social sciencei and science. All the tests for a 
single level are contained in a single booklet. 

Here is a nummary of the contents of the three tests under review: 

Primary Level II 
Vocabulary — Thirty*seven three*choiee items, twenty minutes. 
Pupil marks the word that be5t fits into a sentence read by 
the teacher. 

Word Study Skills — Sixty-five three-choice items, twenty*five 
minutes. A. Pupil selects the word that matches what the 
teacher pronounces. B* Pupil matches a sound in one word 
with that in another. 

Spelling — Forty*three three*choice items. twenty*five minutes. 
Pupil indicates each spelling as "right." **wrong/' or*'don't 
know." 

Listening Comprehension — Fifty multiple*ehoice items, thirty* 
five minutes. Pupil follows instructions read by the teacher. 

Intermediate Level I 

Vocabulary — Fifty multiple-choice items, twenty-five minutes* 
Pupil marks the word that best fits a sentence. 

Word Study Skills— Fifty* five multiple*ehoice items, twenty* 
five minutes. A. Pupil matches sounds. B. Pupil observes 
which syllables, when put together will form words* 

Spelling — Fifty multiple-choice items, fifteen minutes. A* 
Hom.onyms. B. Pupil chooses the ont incorreet spelling in a 
group of lour words* 

Language — Seventy*nine miiltiple.choiee items> thirty*nvc 
minutes. A. Usage items include verbs, pronouns, punetua* 
tion, capitalization, and miscellaneous matters of diction. 
B. Pupil determines whether a group of words is a complete 
or an incomplete sentence. C. Pupil determines whether a 
group of words is a sentence, two senteneeSi or no sentence. 
D. Pupil aitswers questions on dictionary entries. 

Advanced Level * 
Vocabulary— Fifty muItiple*choiee items, twenty minutes. 
Pupil selects *vord that best finishes a sentence* 



ERIC 



109 



t 

IJ2 LRHOOK 



Spelling — Sixty multiple-choice items, twenty minutes. A. 
Homonyms. B. Pupil chooses the one incorrect spelling in a 
grou p of fou r words. 

Language — Seventy-nine multiple-choice items, thirty-five 
minutes. A. Assorted usage items, in context. B. Pupil 
determines whether a group of words is a sentenee, two sen- 
tences, or no sentence. C. Pupil answers questions on dic- 
tionary entries. D. Pupil answers several questions on refer- 
ence books, literary concepts, and grammatical concepts. 

Estimate of Validity and Usefiilness 
Strong points of these tests include; 

1. a practice test for the primary level to introduce small children 
to possibly unfamiliar kinds of tests; 

2. vocabulary items that with few exceptions appear suitable for 
the indicated grade levels; 

3. word study skills that are simple but basic and sometimes 
slightly imaginative in presentation; 

4. spelling tests noworse than average; 

5. an excellent listening comprehension test for the primary level, 
a test that may give some teachers ideas about why and how to 
teach this still-too*neglected skill; 

6. language tests that stress such basic matters as verbs and pro- 
nouns and that measure to a reasonable extent pupils* ability to 
distinguish sentences from sentence fragments and run-ons. 
The testmakers have pretty well kept up with recent language 
study. 

Weaknesses tend to be in a small number of individual items. A 
fe\^^ usage items are subject to question, especially with regard to 
capitalization; for instance, the test insists on Center Street, even 
though journalistic style uses a small s. It insists also on the debat- 
able K\ng and Prime Minister of England. Also, in the sentence- 
identification segments of the Language test for both intermediate 
and advanced levels, fewer than a fifth of the items, insteod of the 
expected third, are of the run-on variety so commonly found in 
school and even college writing. In the intermediate Language test, 
parts B and C overlap in purpose and method. One test insists on a 
comma after an introductory adverbial clause even when the clause 
is rather short and there is no chance of misreading; editors of 
reputable magazines often omit the comma in such a case. In an- 
other part of a language test, pupUs are asked whether their finger- 

HO 



TESTS ON THE ENGLISH LANGUAGE 1 13 



nails rattle, chatter, whistle, or screech on the blackboard; this re- 
viewer tried to find out empirically and succeeded in attaining a rat- 
tle^ a screechf and even a sort of whistle, and hence finds that item 
confusing. Perhaps some of the vocabulary words in the Advanced 
test are a bit stiff for eleven- to fourteen^ear-olds: quiescence, con- 
clave, and anomalous, for instance, but maybe those words are 
needed to accommodate the few people who would otherwise leap off 
the top of the scale. And these testmakers, like others, have still not 
found a really satisfactory spelling test, although the words they use 
are better chosen than most and the misspellings are among those 
that children actually write. 

Despite such little flaws, the^e tests should certainly be included 
among those to be considered by an elementary, middle, or junior 
high school interested in obtaining a reasonably comprehensive 
understanding of pupils' individual achievements in the academic 
areas considered here. 



TctU of Acadt^mlc Progreis^ CompMitlon. Form S. Dale P. Scan- 
nell, Boston: Houghton Mifflin Co., 197K (See Purves review p. 135 
for Literature tsst) 



The Composition tests for the four grades are contained in the same 
booklet, with overlapping items. That is, grade 9 does exercises 1-64; 
grade 10, 19-83; grade 11,47-111; and grade 12, 65-130. In general, 
the lower numbered exercises refer mainly to mechanics and verb 
and pronoun usage, and the later exercises pay r . attention to 
paragraphing and slightly more difficult matters of mechanics and 
usage. The test consists of short prose pieces (e.g., a letter and a 
book report), followed by multiple-choice questions; lists of terms 
which are to be grouped together according to given categories; and 
nonsense verbs (e.g., *'to neg") for which five tense forms are given, 
followed by sentences in which one of the forms must be inserted. 
About sixty-four items are done by each grade. 

Estimate of Validity and Usefulness 

The prose selections would be moderately interesting to high school 
students. The questions about paragraphing, however, are rather 
gCMxl. The device of making all grade level tests available in over- 



111 



114 LN.HOOK 



lapping form could rc:vult in some interesting statistieal comparisons 
within a school. 



Tests of Adult Basic Education. Forms M and Monterey, Calif: 
California Test Bureau/McGraw-Hill Book Co., 1967. 

These tests are adapted from the 1957 edition o^CuUfoniia Achieve- 
ment TestSr WXYZ Scries, devised by Ernest W. Tiegs and Willis W. 
Clark. Other tests in the adult battery are in reading and arithmetic. 
Form M is of medium diffieulty and form "difficult." (There is 
also a form E, ''e:;sy/* but it has no language seetion.) In general, 
the tests aa* intended *'to meet a growing need for instruments espe- 
cially designed to measure adult achievement the basic skills of 
reading, arithmetie* and language." 

In form M, the Hrst thirty-seven items refer to capitalization, with 
the f;est'taker expected to indicate each time whieh one of four words 
needs to be capitalized. In each of the next thirty-four items the test- 
taker indieates whether any mark of punctuation is needed. The 
next twentj*six items require a choice between pairs of words like 
Jle\\\Jh\vtt or too. two or better, best. Then eome nine items that 
may or may not be sentence fragments. The test ends with thirty sets 
of four words, one of which may be misspelled. Total working time 
for the 136 items is thirty-eight minutes. Form D is similar in con- 
tent. 

Esiitnate of Validity and Usefvlness 

According to the manual, test items from the CuHfomiu Achieve- 
tncni Tests have been revised to make them more suitable for adults. 
The attempt is by no means completely successful, with its accounts 
of trips to New York with Mom and Dad* Tom*s endeavor to whee- 
dle a motorcyele out of Uncle Ed, and the like. 

The amount of emphasis on capitalization and punctuation is ex* 
cessivc^71 of 136 items in form M, 59 of 129 in form D, The spell- 
ing test, like most others, involves reeognition only. Some of the 
items in mechanics and usage are at best questionable. Must nVerbe 
capitalized in Missouri river? Publishers* stylebooks differ on this, 
so why include the item? In tny aunt Betsy, is it imperative that 
aant be eapitalized, as the seoring key says? After all^ we don't 



ERLC 



112 m 



TESTS ON THE ENGLISH LANGUAGE 1 IS 



write my Brother Ben. May siatk be used as past tense? Webster s 
Third says so, but not the makers of this test. 



Thurstone Test of Meittal Alertness. Forms A and B. Thelma G. aitd 
L. L. Thurstone. Chieago: Seienee Researeh Associates, 1952 {form 
A) and 1953 (form B). 



As many as possible of the 126 items are to be finished in twenty 
minutes. Some ask for definitions in this way: ''the letter that begins 
the name of the first meal of the day." Others are same-opposite^ 
cj;., **What word means the same as or the opposite of the first word 
in the row: many A. iH B. few C down D. sour." 

Estimate of Validity ami Usefulttess 

The test IS intended to aid in the seleetion, plaeementi and evalua- 
tion of employees; in schools, it is said to refleet ability to eompre- 
hcnd complex material, foreeast success in aeademie subjeets, and 
afford a comparison with the seores of persons in vocational eate- 
gortes. Statistics in the manual suggest that the test performs these 
functions rather well. Male eollege graduates, for example* seore 
considerably higher than automobile salesmen, most of whom, at 
least at the time of standardizing, were presumably not eollege 
graduates, but automobile salesmen seore higher than retail sales 
purbonnel in general. Students* grade point averages correlate fairly 
elosely with their test seores. 

The voeabulary words in the test are of reasonable degrees of dif- 
ficulty, e.g., aqueous but not vitreous. The number of items to be 
completed in twenty minutes requires fast reaetion,.i.e., alertness. 



Waltoit'Sanders English Test. Test I, forms A and B; and Test II> 
forms A and B. Charles E Walton and M. \V. Sanders. Emporia, 
Kans.: Bureau of Edueational Measurements, 1964. 



There are three partb, totaling 150 items, in fifty-minute test for 



ERLC 



113 



116 J, HOOK 



grades 9-13. Part I, The Word^ is subdivided into a twenty- five-item 
section on vocabulary (multiple-choice definitions); a fifleen-item 
section on syllabication (counting the number of syllables in a word); 
and a twenty-item section on spcltingtfinding the misspelled word in 
a group of five). Part 11, The Sentence, has twenty items on identify- 
ing parts of speech; twenty items on use of nouns, pronouns, and 
adjectives within sentences; and fifteen items on identification of 
verbals. Part HI consists of thirty-fwe items on punctuation. 

Estimate of Validity and Usefiilness 

There is little value in counting syllables in a word, yet this repre^ 
5ents a tenth of this test. The fifty-five items on the sentence all in- 
volve mere identification. There are no items on usage, and, as is 
true of many other tests, nothing on sentence construction. A test 
just like this could have been constructed at the turn of the century 
and would have reflected what at that time was happening in the 
classroom. 



Wide Range Achievement Test (WRAT), Revised Edition, Leveh I 
and II, J. F, Jastak, S, W, Bijou, and S, R. Jastak, Wilmington, Del.: 
Guidance Associates of Delaware, 1965, " 



The WRA T has sections on spelling, arithmetic, and reading. Level 
1 is for ages 5 years through 11 years, 11 months; Level 11, 12 
through adult. In each test, easy items come first, with difficulty in- 
creasing steadily and rapidly. 

The Level I spelling test consists of fortyTive words from simple 
monosyllables to words that would be fairly difficult for junior high 
students. For younger students there are also brief sections involving 
the drawing of letter-shaped designs and the letters of the child's 
name. Level II has forty-six words and also starts with mono- 
syllableSi but moves in larger stepsi ending with words that would 
not be in most high school students' vocabularies* Total scores for 
both tests are equated with typical grade levels; thus, a Level I 
adjusted score of 50 equals grade level 7,2* In giving the test, the 
examiner dictates the words in sentence context* Thus, this test, un- 
like most, actually requires the test-taker to write the word and not 
just recognize whether a spelling is correct or not. 



114 



TESTS ON THE ENGLISH UNGUAGE 117 



Estimate of Validity and Usefulness 

The manual devotes several pages to statistical evidence concerning 
the validity of the test The evidence seems to show high correlations 
with other kinds of tests. A teacher who wants an estimate of how 
well a class or an individual spells in relation to national norms 
should find this test useful It cantiot, however, serve a diagnostic 
purpose; that is, the particular kinds of errors a given student or 
class makes cannot be accurately analyzed. 



Jlo 



Evaluation of Writing Tests 

Riduxrd liraddock 



Unless one wishes to restrict one*s conception of '*writing" to the 
kind of essays employed in elassrooms largely to test a student's 
understanding of a literary text or proficiency in avoiding certain 
**crrors'* covered by an English handbook, one sees writing as a stag- 
geringly complicated and varied process. Some modes of writing are 
highly personal, like the kind found in self-initiated journals or indi- 
vidual pieces written to get something off the writer's chest or to re* 
cord events or feelings for the writer's later reference. Unless the 
writer solicits the reactions of someone else to such writing, it is no 
one else's affair. Other writing, which seeks to communicate with 
readers removed in space or time, is a more public affair and at 
times may properly become the concern of the schools and even soci- 
ety at large. It is apparent that a writer's prospective readers may 
vary from an intimate friend who knows him or her well to a loosely 
defined category of people whom the writer has never seen and 
whose sense of values, set of experiences* and dialect of English dif- 
fer from his or her own. Furthermore* a writer may have widely vary- 
ing purposes from time to time: to explain to readers something 
about which the writer claims a special understanding; to get read- 
ers to share ^ feeling for something* to sell them something; to solicit 
their votes. 

When **writing"* is viewed as a varied and complicated concept, 
one readily sees that **writing ability"* is a pair of words which has 
little meaning when taken in the abstract. Are there definable 
characteristics, understandings, and skills which are common to all 
modes of writing, for all kinds of prospective readers, for all possible 
purposes of writers? How can one consider *'writing ability"* in 
general when no one knows what it is? It may not even exist. 

Of course there is a tendency for all types of writing to have com- 
mon characteristics of sentence structure^ word meanings and 

tt8 



ERIC 



110 



EVALUATION OF WRITING TESTS 119 



mechanics (spellings punctuation, capitalization, handwriting and 
typing, and general format). That this is no more than a tendency, 
though^ is readily apparent to anyone who has compared these fea* 
tures in a scholarly article^ an Uncle Remu!> story, and a cigaret ad. 
For some readers, the most communicative description of the plight 
of a black ghetto dweller may be cast in their own variety of English, 
not exclusively in the forms which English handbook: offer as stan- 
dard. StiU, One can say that standard English tends to have common 
characteristics if one is referring to the matters listed at the begin- 
ning of this paragraph. Accordingly, tests have been devised to indi- 
cate the degree to which students can distinguish between standard 
and nonstandard forms about which there is little disagreement. 
Even these tests must be revised from time to time, however, as peo- 
ple's consensus shifts about what is standard and what is not 

But to suggest that tests over such standard forms are tests of writ- 
ing ability is patently absurd for two reasons. First, such tests sam- 
ple students' ability to distinguish between standard and non- 
standard forms which someone else has written, not their ability to 
write their own.,At best, these tests evaluate proofreading ability or, 
more charitably but less accurately, editing ability. Second, such 
tests make no attempt to measure students' ability to accomplish 
other aspects of writing. No commercially available standardized 
test attempts to measure a student^s ability to select a subject, and 
an approach and a mode for it appropriate to the writer and the 
prospective readers. No commercially available standardized test at- 
tempts to measure a student's ability to organize and detail his or 
her writing so that prospective readers can share the writer's experi- 
ence and appreciate his or her purposes. At this stage of our under- 
standing of writing and of testing, it is difficult to believe that any 
standardized test will be constructed which can measure such abil- 
ities. Therefore, anyone who professes to evaluate **writing ability" 
with a standardized test is either telling a falsehood or speaking 
from ignorance. It is a cruel joke that some cprporations are today 
selling school boards on the notion that they can do just that. And it 
is ironic tliat some colleges are excusing students from composition 
requirements merely on the basis of standardized test scores. 

Roughly predicting a student^s ability to do well in a composition 
course is something else. If large numbers of students are to be 
tracked according to their proficiency in writing standard English, 
and if success in the tracks is judged heavily on that basis, then 
standardized test scores may have predictive value. Students who are 
misclassified by such tests can be reclassified after their teacher has 



ERIC 



120 RICHARD BRAdDOCK 



had a chance to see several examples of their actual writing. But the 
use of standardized tests for tracking purposes at the beginning of a 
course should never lead anyone to use them to evaluate ''writing 
ability'* at the end of the course. That would be like using a ruler to 
measure the artistic quality of a painting. 

What actually is included in commercially available tests that is 
purported to measure ''writing ability*'? Usually tests consist almost 
entirely of items concerned with sentence structure, word meaning, 
and mechanics. They never attempt to cope with a student's ability 
to select a subject, pursue a specific intention, and effectively 
address a particular audience. Occasionally, however, a test will in* 
elude items which relate to a student*s ability to judge the organiza- 
tion and substantive details of a piece of writing — even though, of 
course, the writing is not the student's and the prospective readers 
are usually not identified. 

Three commercially available standardized tests which claim to 
measure ''writing ability*' were examined to see how many items 
concern aspects of writing which extend beyond the confines of a 
sentence, other than the reference of a pronoun to an antecedent in a 
preceding sentence. The three are Tests of AatdcmtQ Progress. 
Composition. Form 1» Houghton Mifflin Company, 1964; Basic 
Skills System Writing Test. Form A. McGraw-Hill Book Company^ 
19*^0; and Sequential Tests ofEdiicatiomil Progress, Writing. Form 
2B> Educational Testing Service, 1957 (referred to below as STEP). 

The McGraw-Hill test was designed for grades 9-12, the^r^P test 
for grades 10-12 (though forms for other levels are also available). 
The Houghton MiHIin test is organized in overlapping units to be 
taken by students at different levels: ninth graders take items 1-70, 
tenth graders 23-92, eleventh graders 47-117, and twelfth graders 
71-142. No judgment is implied here of the suitability of a category 
of items or the effectiveness of particular items. For example, the 
McGraw-Hill test seems to imply that a topic sentence is always the 
first sentence in a paragraph, and it seems to leave no room for the 
possibility that some well-written paragraphs have no topic sen- 
tence. Moreover, good writers would disagree about which are the 
best answers for some items in all three tests and would not care to 
choose any of the possible answers as a "best" answer for a few of 
the items. 

An interesting question arises as one examines the tests. What is 
the difference between a test of reading ability and a multiple-choice 
test of "writing ability"? If one did not wish to consider a student's 



ERLC 



118 



EVALUATION OF WRITING TESTS 121 



ability to choose "standard" items in sentence structure and 
mechanics, could a students success in a composition course be pre- 
dicted just as welt bv itsing a good reading test instead of the tests 
considered here? 

Refereitcei 

BraddocJc, Richard, Lloyd-Jones, Richard> and Schoer, Lowell 
Research in Wriuen Composition. Urbana, III: National Council 
of Teachers of English* 1963, pp. 40-45. 

Diederich, Paul B. "How to Measure Growth in Writing Ability/' 
English Journal SSiAprU 1966): 435*449. 

GodshalJc, Fred Swineford, Frances, and CofTman, William E. 
The Measurement of Writing Ability, Research Monograph No. 
.6. Princeton^ N.J.. College Entrance Examination Board, 1966. 
Also see the reviews of this publication and the response of its 
authors in "Roundtable Review/' Research in the Teaching of 
English I (Spring 1967): 76-88. 

Sherwood, John Cm et al. "Terminal Report of the Committee on 
Testing." College Composition and Communication 17 (Decem- 
ber 1%6); 269-272. 



119 



122 RICHARD BRADDOCK 



College English Placement Teit (CEPT), Oscar M. Haugh and 
James L Brown. Boston: Houghton Mifflin Co., 1969. 

The manual lists these four purposes for CEPTi (1) to measure reli* 
ably and quickly a student's ability to use the language effectively; 
(2) to provide accurate information for placing a student in the kind 
of composition class best suited to individual needs and abilities; (3) 
to indicate specific language areas needing further attention and 
study; and (4) to provide insights bearing on the entire process of 
written composition, from the selection of an appropriate subject to 
the fmal proofreading. 

In order to achieve these purposes, the authors have constructed a 
106-item* multiple^choice test and assignments for two short eompo- 
sitions that cannot be objectively scored. The objective test can be 
given independently of the writing and requires forty-five minutes. 
Coverage is unusually broad, including size of subject for a composi- 
tion; arrangement of ideas; transitions; selection of most effective 
sentences; vocabulary (analogies, appropriateness to context); gram* 
mar of an irregular verb; and mechanics (capitalization, punctua- 
tion, spelling, forms of words). 

Estimate of Validity and Usefulness 

This test is much better than most. It should accomplish its des- 
ignated purposes extraordinarily well. Items are clear, carefully 
worked out, and sometimes imaginative. Unlike many other tests, it 
is not grossly overbalanced in the most easily measured areas such as 
spelling and punctuation. It gets at important facets of writing — 
such as arrangement of ideas — and largely eschews trivia. Te^ts as 
good as this can be constructed for the lower schools, bufrew of 
them exist. 



Content Evaluation Series Language Arts Tests, Composition Test. 

Form K Leonard Frcyman (Kellogg W. Hunt, series editorial advi- 
sor). Boston: Houghton Mifflin Ca, 1%9. (Sec Hook review p. 103 
for Language Ability test and Purves review p. 133 for Literature 
test.) 

Tlie aim of this test, covering grades 7-9> **is to assess the ability of 



EVALUATIONOFWRITINGTESTS 123 



the student to manipulate his language effectively — that is, to ex* 
press himself correctly^ clearly, and forcefully in a series of inter- 
related, meaningful sentences organized to bring out a central point 
To realize this goal, the author has developed the test around the 
three principles of classical rhetoric — invention, organization, and 
style* Through a series of interesting, thought provoking exercises, 
the author involves students in practical composition situations. To 
the degree that the student can meet the demands of those situa* 
tions, he reveals his ability to express himself on a given topic/* 

The sixty items in this forty-minute test are considerably different 
from those in most composition tests. Here are questions on size of 
topic, arrangement of subtopics, organization of sentences in a para- 
graph, transitions, and relative effectiveness of various statements of 
the same idea. But there are not the usual ones about choosing be- 
tween her and she or did and doue or a semicolon and a comma. 

Estimate of Validity and Us^fitlness 

This is one of the best ccwnposition tests for the junior high school, 
because it emphasizes important matters rather than the usual in* 
conscqucntials. A student who does well on this test is ready for most 
writing assignments he or she is likely to get in high school; in fact, 
he or she may be more ready for college writing than many college 
freshmen arc* This does not mean that the test is too difficult. It is 
not, for properly taught students* JEnglish counes that emphasize 
the kinds of writing instruction implied by this test will adequately 
prepare students not only for the test but — much more important — 
for later writing* 



McGraw-Hill Bask Skills System Writing Test. Form A* Alton L. 
Raygon Monterey* Calif.: McGraw-Hill Book Co., 1970. (See Hook 
review p* 104 forSpclUng and Vocabulary testsJ 



The three fifteen-minute segments of this test arc devoted to Lan- 
guag^ Mechanics^ Sentence Patterns, and Paragraph Patterns. For 
the fint r these (thirty itcms)^ bits of a composition are underlined, 
and students arc to indicate whether they find in each an error in 
capitalization^ in punctuation^ in ''grammar.*' or no erron The sec- 



121; 



124 RICHARD BRADDOCK 



ond part (twenty*six items) asks for identification of sentence frag- 
ments and simple, compound, and complex sentences; for the 
"grammatically correct** sentence in a group of four; for the sen- 
tence in a group of four that shows parallel construction^ and for the 
most appropriate transition between sentences. The th.id part 
(fifteen items) asks students to choose the best topic sentence for a 
paragraph: the sentences be5t developing a given topic sentence; the 
best concluding sentence; the best sequence of sentences in a para- 
graph; and the best places to start new paragraphs. 

Estimate qf Validity and Usefulness 

The section devoted to paragraphs is ingenious and useful. It mea> 
sures students' mastery of important concepts. The section on the 
sentence is a peculiar amalgam. Aften ten not very useful identiftca* 
tions of sentence types (more than a third of the whole section), there 
are five groups of four sentences, of which precisely one is said to be 
correct in each group. This reviewer, however, finds two unimpeach* 
able sentences in one group; in another, there is a grammatically 
correct sentence, but the punctuation is unquestionably faulty. Sev- 
eral of the sentences in the parallel construction group are pretty 
strained. The section on choice of transitions is well conceived but 
imperfectly executed. In summary, measurement of students' mas- 
tery of sentence patterns apparently entails only identification of a 
few sentence types plus recognition of satisfactory relative pronouns, 
parallel structure, and transitions; in this test, some of the items 
designed to measure those elements are poorly executed. 

Worse, though, is the section on Language Mechanics. Here is an 
exact parallel of one part: 

^ Did you ever seen the story of Wr. Hyde on television. Mt stars 
Peter Lor re as the infamous ^ monster. Complete with gigantic 
teeth and ^slobbovian accent 

Students are to indicate errors in capitalization, punctuation, and 
grammar, or may mark "No error." Number 1 is a grammatical 
error, but so farfetched as to be a worthless item. Number 2, the an- 
swer sheet says correctly, is an error in punctuation. Number 3, says 
the answer sheet, is a grammatical error. Why, in heaven's name? 
Number 4 is called an error in punctuation, as it is, but why penalize 
the student who says it is an error in capitalization? Critical read- 
ing by a few experts in English could have quickly eliminated the 
serious flaws in this test. 



ERLC 



122 



EVALUATION OFWRITINCTESTS 125 



Coopemttvc Sequential Tetti of Educatlotuil Frogieu (STEP)« Writ- 
ing. Forms lA, IB. 2A, 2Bt 3A, 38, 4At and 4B* Princeton, NJ.: 
Educational Testing Service, 195T (See Hook review p. 109 for 
Listening test.) 



Other tests in the complete battery ^nclude reading, listening, essay 
(separate from the Writing test), social studies, science, and 
mathematics. The STEP tests are achievement tests that focus ''on 
skill in solving new problems on the basis of information learned, 
rather than on ability to handle only lesson material*** 

The Writing test, says one of the several manuals accompanying 
STEP, **Measures ability to think critically in writing, to organize 
materials, to write material appropriate for a given purpose, to write 
effectively, and to observe conventional usage in punctuation and 
grammar Materials were selected from actual student writing in let* 
ters, answers to test questions. . . .** 

The test items for all grades are intended to go far beyond the 
usual emphasis on mechanics, or beyond even mechanics and 
organization. The proportion of items classifled **on the basis of the 
major responses they demand*' is as follows: 

Forms for Forms for 

Grades 4^6 (percent) Grades 7-14 (percent) 
Organization 30 20 
Conventions 20 20 
Critical thinking 15 15 
Effectiveness 20 30 
Appropriateness 15 15 

As these figures show, the test designers believed that in high school 
and college more attention should be given to effectiveness, less 
attention to organization. The Teachers Guide classifies each item 
by placing it in one, and sometimes two, of these five categories. 

Divided into two thirty*five minute segments, each Writing test 
consists of sixty multiple* choice items. 'The passages on which 
groups of items arc based are drawn largely from materials actually 
written by students in schools and colleges — assignments which by 
and large were graded poor or failing." Typically the questions are 
of this sort: "Which of the fallowing would be the best version of 
Sentence X?" But there are many variations, such as "Which 
should come first?," "Which two parts of Paragraph 1 belong in 



123 



126 RICHARD BRADDOCK 



the same sentence?/' and **Which of these sentences contains a 
misspelled word?'* 

Estimate of Validity and Usejvlness 

These are better tests than most. All five areas of emphasis listed 
above are important, but too few tests pay adequate attention to any 
of them except conventions (mechanical correctness). Many of the 
items are creatively structured. The degree of difficulty of items 
appears suitable to the designated grade levels. Use of student writ* 
ings as points of take-off is much better than the use of concocted 
passages. 

One weakness is that in one or two items in each test* more than a 
single answer could be reasonably defended. Another is that errors 
and infelicities in the student-written passages frequently go unmem 
tioned. And^ as is true of any writing that relies on indirect measure- 
ment* one cannot be sure that the person who scores high on this test 
w ill be a good writer; he or she is unlikely to be a very bad writer* yet 
proof of ability to edit someone else's writing does not necessarily 
prove that the test-taker can write w^ll. Perhaps, indeed* as an ear- 
lier reviewer has suggested* the STEP Writing test is really a mea- 
sure of general scholastic aptitude rather than of writing ability; its 
high correlations with tests designated as scholastic aptitude tests 
tend to support that belief. 

The Teacher's Guide provides an unusual service: suggestions for 
using the test results to aid student learning. Class discussion of dif- 
ficult items is recommended (most standardized tests are hush-hush; 
students take them and may never hear about them again). Also* the 
CttWt' suggests that classroom instruction can be geared to the needs 
of a specific class by examining the kinds of difficulty most common 
in the class. It suggests (xitiher (mirabile dicttt!) that rewriting of the 
prose passages in the test provides a useful writing exercise. The 
Guide itself is well and interestingly w ritten(**You may save yourself 
some time and fury if. . . something seldom characteristic of test 
manuals. 



ERLC 



124 '^i:; 



llterituie Tests 

Alan C Ptirves 



One of the most noticeable aspects of this section is the small num- 
ber of tests available in literature despite its prominence in the 
school curriculum. The reasons for this phenomenon might include 
both the diversity in the curriculum and the intractability of the sub- 
ject matter Literature is avowedly not easy to make up tests about, 
as many of these tests demonstrate. 

We have divided the tests into two groups: those dealing with the 
recall of information about selections already read (who wrote 
what? what character appears in what?) bj the student, and those 
dealing with thestudents understandhig of works just read (usually 
with the text available to the student). The first kind of test suffers 
because of the transitory nature of many selections in the curric- 
ulum; most of the tests reviewed in this group need updating. A 
more serious problem is that of the diversity of offerings, even were 
all the selections cmr^nl Macbeth is taught in only 70 percent of the 
schools. What do we know of the pervasiveness ofMy Nattte is Aram 
or ''Snowbound"? National tests of recall must then be reviewed 
carefully by a school and the results viewed within the context of the 
chitdren*s opportunity to learn each of the items on the test. 

The second kind of test deals with reading ability and acumen, 
with comprehension and interpretation. As such, the test is similar 
to reading tests, save in the restriction of type of passage or selection. 
Many tests are carefully made and statistically sound. Their pitfalls 
include tendeneies to deal only with iow*level inferences (vocabulary, 
identification of speaker, and the like) and to try to force a single in- 
terpretation on the test-taker. An additional pitfall is neglect, in 
most of the tests reviewed, of the emotional and aesthetic aspects of 
our reading of literary works. Few questions deal with how students 
like a work, how it makes them feet, and what image it creates^ 
Standardized tests like these, therefore, should not form the whole of 
an evaluation program. As a part of that program, these may be use- 
ful in varying degrees as the reviews show* 

127 



125 



128 AUNCPURVES 



TetU of Knowkdgf and Recall 



Stanford Achievement Tett (SAT), High School Arts imd Hunun- 
Itie* Test Form X. Eric Gardner^ Jack Merwin^ Robert Callis^ and 
Richard Madden. New York; Harcourt, Brace & World, 1961 (See 
Hook review p. 110 for SAT Primary Level 11, Intermediate Level i, 
and Advanced LevelTestsJ 



Containing sixtyTive four-choice items, the test covers literature, 
art, music^ philosophy, and film. There is one passage for some anal- 
ysis. The test is accompanied by an extensive manual describing the 
trials done in 1963 and the standardization done in 1%5* The mean 
score for grade 9 was 29; grade 10, 32; grade 11, 36; and grade 12, 
39. A college preparatory group scored somewhat higher 

Evaluation 

This test requires students to recall Jiiographical and historical in- 
formation as well as factual information about artistic works and 
critical terms. Students are asked to identify such diverse items as 
van Eyck, Tanglewood, Taj Mahal "numbers," and The Catcher in 
the Rye. Most of the items are clear and unambiguous, although 
they do not deal with much more than quiz show information. There 
is no real analysis or interpretation, no measurement of any real con- 
frontation with an art work. Given this limitation and the one that 
no items deal with art by minority groups (save one item on Dixie- 
land), the test is adequate. 



Hollingsworth'Sjinders Junior High School Ltteratuiv Test. Test I, 
forms A and B; and Test IL forms A and B. Leon HoUingsworth and 
M. W. Sanders. Emporia, Kans.: Bureau of Educational Measure* 
ments, 1964. 



There are four tests for grades 7 ami 8, each with 105-150 four- 
choice questions. The tests rely on identification of quotations. 



m 



LiTERATURE TESTS 129 



characters^ and titles. There is also a vocabulaty section. The tests 
were normed in 1962 and 1963, using a national sample. 

Evaluation 

This test is now hopelessly inappropriate for its audience. It requires 
a recall of passages, titles, characters, authors, and incidents of 
works no longer taught or, if taught* not with any consistency in the 
junior high school. Often the works referred to are excerpts which 
arc given nonce*titles in the texts. The authors base their tests upon 
"Homework/^ "Fast Ball/' and **Mr. Chairman." Many of the 
items are ridiculous, "Sometimes we Americans refer to L Lord 
Byron* 2. Alfred Tennyson, 3. Edgar A. Poe, 4. William Shakes- 
peare* 5. A. E. Housman as the master poet of all times." The 
quotations given are often obscure. There are no items dealing with 
works by minority group writers. 



Hotkins-Sanders L{Ur«ture Tevfs* Test K forms A and B; Test II* 
forms A and B. Thomas Hoskins and M. W. Sanders. Emporia* 
Kans.: Bureau of Educational Measurements, 1964. 



There are four ISO* item, multiple*choice tests dealing with writers, 
titles, and literacy terms; the tests are designed for grades 9*13. They 
cover British and American literature and give some attention to 
world literature. The authors claim that the test material deals with 
"35 classical selections." The tests were normed in 1963; the norms 
indicate that the 99th percentile at the 12th grade level averaged 90 
out of ISO right. 

Evaluation 

As the norming data reveal, the series deals with details from a van* 
etyof literary works, both famous and trivia!* measuring a student's 
recall of a large number of selections that were standard in the 
curriculum over a generation ago. The test ranges from Bede to 
Booth Tarkington and calls for knowledge of authors* titles, char* 
acters^ events* and genres. It does measure breadth but in a hope* 
lessly antiquated fashion. There is no reference to literature of 



ERIC 



127 

* ' 



130 ALANC PURVES 



minority groups and no attempt to measure analysis and interpreta- 
tion* 



Iowa High School Content Examination for High School S«n{ois 
and Coll«g« Freshm«n. Form G. M. Ruch, B* Stuit, and H* A. 
Greene, Iowa City; Bureau of Educational Research and Service, 
1943* 



As part of a several -subject test, there is a section of 100 four*choice 
items, mostly dealing with literature, although there are a number of 
items on usage and vocabulary. The test was normed in 1943 on 
samples of Uth and 12th grade students ''scattered through the 
nation/* The examination samples ''important facts and concepts 
which a high school student could be expected to know upon 
completion of his high school course/* 

Evaluatioit 

Without a doubtt this is one of the worst tests still in print. All of the 
items require recall of information. Most of the items art trivial (the 
authorship uf Mn BritUng Sees It Through), unanswerable ("(1) 
Shakespeare (2) Pope (3) Eliot (4) Tennyson developed the idea of the 
immortality of the soul in his poems")^ or fraught with mistakes 
(S*:rooge'5 partner is referred to as Marlowe)* The material is dated 
for high school students and there is no Black literature. The test 
does not measure the comprehension it purports to* 



Illinois Tests in the Teaching of English, Knowledge of Literature, 

Competency test C, William H, Evans and Paul H. Jacobs, Car* 
bondale. 111** Southern Illinois University Press, 1969 and 1972, (See 
Hook review p* 102 for Knowledge of Language test*) 



The test, designed for prospective teachers* consists of 143 items, 
121 of which ask the test-taker about literary works* periods^ genres, 
and critical terms. The remainder deal with the critical reading of 




128 



LITERATURETESTS 131 



prose and poetry. Included in the identification and classification 
^section are a number of works of juvenile and adolescent literature 
as well as Black literature and world literature. The test was tried 
out on prospective teachers, but there are no norming data. 

Evaluation 

I had a chance to review many of the items for this test bet\veen try- 
out and final assembly of the test. Then as now, I was struck by *he 
ingenuity of some of the item types, particularly the classification 
items, an example of which follows: 

1. Hamlet A. Moby Dick 

Macbeth B. The Scarlet Letter 

Romeo and Juliet C King Lear 

D. Pride and Prejudice 

E. David Co pperjteld 
REASON: All were written by the same author. 

The reasons extend over form, theme, authorship, period, and 
genre. In general, this part of the test measures a prospective teach- 
er's knowledge of facts about books and authors well, although some 
items may already be dated. The critical reading section, although 
brief, is adequate, and the East set of items asking the test-taker to 
compart and contrast three poems about nature is well done, A per- 
son who docs well on both parts of the test would certainly dem- 
onstrate a strong knowledge of literary facts and an ability to read 
with acumen. Whether there is more to literary study than those t\vo 
abilities is another matter, but for its aims, this test succeeds. 



Literature Tests, Objective and Essay. Fifty and 100 question series, 
and essay scries. Logan, Iowa: The Perfection Form Co., 1946-1970. 



Written by a variety of people, the objective scries consists of 110 
one-hundrcd-itcm tests and ninety fifty-item tests on some of **the 
greatest books ever written" C4t'»efc/ to The YcarlingX There are 
also essay tests on all 140 books. I have examined only those tests on 
Lord of the FUeSt The Bridge of San Luis Re)\ Pygmalion^ Moby 
Dick, The Red Badge of Courage, The Scarlet Letter, Treasure Is* 
land, Tom Sawyer, Silas Mamen The Retnm of the Native* The 
Odyssey* Oedipus Rex, Huckleberry Finn, The Old Man and the 

129 




132 ALANCPURVES 



Sea. Pride and Prejudice, Evangeline, Hamlet* The Human Comedy, 
Julius Caesar* Macbeth* Our Town* The Merchant of Venice, and 
Great Expectations. The tests consist of true-false^ matching^ fxlUxn, 
an(i multiple^choice questions. There is no information on norms. 

Since the tests are quite similar, I will describe the one onHuckle- 
berry Finn (by N. R Falk, 1950 and 1965) as typical^ It contains five 
matching questions on the book, ten matching questions on chap 
acter identification, twelve questions on animals and objects, thirty 
true-false questions, twenty-three multiple-choice questions, twelve 
short^answer questions, and eight completions. The items deal al* 
most entirely with recall of details. 

Evaluation 

The test violates most principles of test construction. The matching 
questions, for examples* contain equal numbers on both sides so 
that the test is one of elimination. The true-false questions contain 
matters of opinion (e.g., "Tom had an ingenious mind")* The filMn 
questions presume a single answer to questions like, '*Who tried to 
teach Huck?" Furthermore, the test deals with trivia. Does it mat- 
ter that a banjo and a black shirt are objects that do not appear in 
the book? The test neglects any critical or thoughtful reading of the 
book. It forecloses all matters of controversy. The test on Hamlet^ 
for instance, decides that it is true that Hamlet is not insane, a mat^ 
ter critics are still debating. Although the test might have some 
slight value to the teacher who wants to check on whether students 
have read these books, most teachers would do better with a five- 
minute, home-dittoed quiz. Schools which buy these tests would 
have no defense were a taxpayer's suit instituted charging the school 
with misappropriation of funds. The wastage would be not only one 
of money but of the time and energy of teache;s and students. 



.TetU of Crittcal Reading and IntcrpreUtlon 



The NCTE Cooperattre Tett of Critical Reading and Appieclatlon, 
A Look at Literature. Forms A and B. Roderick A. Ironside. Prince- 
ton, N.J.: Educational Testing Service, 1968. 



ERIC 



130 



LITERATURE TESTS 133 



There are two forms of this test, which is intended for fourth through 
sixth grade students. The forms are parallel, each containing fifty 
four-choice questions dealing with fourteen passages of prose and 
poetry. The auth'>rs have categorized the questions as dealing with 
translation, extension, and awareness (corresponding to comprehen- 
sion, interpretation, and literary analysis). The two forms have been 
statistically equated and correlated with the Reading test, and 
norms have been created based on a sample ofsome 500 students for 
each group. The test is designed primarily as a research instrument 
although it can also be used as an instructional device and possibly 
as an evaluation device. 

Evaluation 

I took part in the initial item review for the test, so that some prej- 
udice might be inferred. Nevertheless, the test does serve as an ad- 
equate measure of comprehension of literary texts. The selections 
are varied, including many classics of children's literature (Astrid 
Lindgreen, Laura Ingalls Wilder, Joseph Krumgold, and Robert 
Larson are among the authors represented). There is little ethnic or 
urban literature, however. The questions do range over a variety of 
abilities related to the reading of literature, and the choices 
represent the work of careful test construction and review. A weak* 
ness of the test is that not every passage is treated with the gamut of 
questions, so that the test-taker is faced with not particulariy coher- 
ent sets of questions and therefore is not asked to synthesize his or 
her understanding. As an evaluation device, A Look at Literature 
must be supplemented with some other measures — essays or other 
written projects. 



Content EviUuatlon Serlct Language A rtt Series, Literature Teit. 

Form I. Ruth Reeves (Kellogg W. Hunt, series editorial advisor). 
Boston: Houghton Mifflin Co., 1969. (See Hook review p. 103 for 
Language Ability test and Braddock re/iew p. 122 for Composition 
test.) 



The test, for use in grades 7-9, consists of forty-five four-choice items 
based on a reading of five passages; three poenis, a selection of non- 
fictionr and one of drama. The questions deal with form and con- 
tent There was no normitig information available. 



er|c 



131 



134 ALANC. PURVES 



Evulttttiiott 

Part of a scries, this test measures the ability to read, analyze, and 
interpret literary selections. The test does measure higher mental 
processes* and it docs follow the *"ncw criticar" line appropriate for 
objective measurement. Occasionally the items have overlong stems, 
thus making thQm answerable \\ithout recourse to the tcxt> and 
occasionally the items arc not of great significance to an under- 
standing of (he lext. But nonetheless the test is sound. The test- 
makers are pretentious in claiming that the test measures a broad 
spectrum of abilities, but it is better than most in this respect. The 
situations used arc a bit crusty and perhaps inappropriate to tnner- 
city youth; there is no Black literature. 



Cooperative Literature Tests. Forms A and B. Princeton. NJ.; 
EctucationalTcsting Service, 1972. 



This scries of multiple-choice, opcn*book tests deals with selected 
major works of literature: A Tale of Two Cities. The Old Man mid 
the Sea. Julius Caesar. Macbeth. The Scarlet Letter, Silas Murner. 
Oedipus the King. The Bridge of San Luis Rey. Moby Dick. The Red 
Badge if Courage. Our T on Ji, Thi^ Return of the Native. Pygmaliotu 
The Merchant of Venice. Great Expectations. The Odyssey. Pride 
and Prejudice. Hueklcberr\ Finn, and Himtkt. For each work there 
dxe !\\o tests, each of forty four-choice items. Although reliabilities 
and norms have been detcrmincd> they were not available at the time 
of reuew. The two tests may be used as a pretest and a post-tcst> or 
a.s a study test and a final test. 

Emluation 

These arc superbl> printed tcsts> and their quality matches their 
covers. The items range over facts> character interpretations, style, 
form. thcme> and mood of the works. To take Pride and Prejudice as 
an example, the test opens with disingenuous items: 

''It is a truth universally acknowledged* that a single man tn 
possession of a gowl fortune must be in want of a wife/' All of the 
following statements accurately describe this First sentence of the 



ERLC 



132 



UTERATURETE5TS 135 



novel EXCEPT 

1* It leads the reader on. 

2. It establishes the ironic tone of the novel- 

3. It says the reverse of what it means. 

4. It states a profound and universal truth. 

Although it ts an eas> question, answered in part by a knowledge of 
irony, and although it may bechallenged by an item-writen this item 
does set forth the tone of the test It is serious, playful, and intellec- 
tual. The items that follow continue in this manner^ with questions 
like, "In order for Elizabeth and Darcy to be brought together, it is 
necessary that. . . "How does the elopement of Lydia and Wick* 
ham advance the plot?"; and "The characters chiefly attaeked by 
the author's humor are those wha * * The students who take this 
test are challenged to read with eare and discernment- 

The virtue of two forms for eaeh work is that of providing 
opportunities to measure growth. A fitting use would be to give one 
form at the ^ -ginning of an instructional unit, and to use the test to 
begin discussion. The seeond test eould form part of a Hnal evalua- 
tion. The two forms eould also be used for aetion researeh. The 
forms are parallel without being redundant. 

One might criticize these tests for being overly "new critical." But 
granted that perspective, the questions are excellently wrought. A 
more serious criticism is that of the order of items in the test. The 
questions seem arranged more in order of difficulty than according 
to the chronolog) of the work or to an order ranging from detail to 
generalization. Nonetheless^ teachers eould profitably use these tests 
either as unit tests or as springboards to essays and projects. 



Tests of Aeadcmie Progress* Literature. Form S. Oscar M. Haugh 
and Dale P. Scannell. Boston: Houghton Mifflin Co., 1971. (See 
Hook review p. 1 13 for Composition test.) 



The test contains !26 four-choice items based on a reading of pas- 
sages of prose anJ poetry, including a passage from Taming of the 
Shrexw The test is printed so that the ninth grade does passages 1-5, 
tenth 3-7, eleventh 5-10, and twelfth 6-12; a teacher may thus eom* 
pare grade levels- items include vocabulary, comprehension, literary 



133 



136 ALANCPURVES 



terms, and dating and geographical placement of the passages. 
Norming informaiion was provided on norms of from 1113 to 1690> 
in Idaho and Montana, as well as a national sample. Norming data 
are given in terms of percentiles. 

Evaluation 

This test alternates between reading comprehension items and items 
of literary classification and interpretation. Often the former are 
not highly significant and the latter are of varying quality (for ex- 
ample, the identification of Shelley as the author of "Mutability" 
seems inappropriate). Some of the items rely on the students' having 
read the work before taking the test or knowing extraneous informa- 
tion. There is no coverage of materials by minority writers. There is 
no coherence to the sets of items for each passage so that the teacher 
cannot discriminate among lower and higher behaviors. As a test of 
the ability to analyze and interpret literary materials^ this one is fain 
tnit not of the best. 



Retpondlng; Ginn InttrreUted Sequencct In lite raturt^ Evaluation 
Sequence* Pretests, growth tests, and diagnostic tests. Charles R. 
Cooper and AlanC. Purves. Lexington, Mass.: Ginn and Co.. 1973. 

[Review edited by Dan Donlan, Universityof California, Riverside.] 



The Evaluation Sequence for Ginn's Responding Series is an in- 
tricate evaluation program based on a response grid which projects 
the interrelationships between, horizontally, eight student behaviors 
(creating, valuing, evaluating, generalizing, interpreting, relating, 
discriminating, and describing) and, vertically, two areas of con- 
tent—the piece of literature itself (subject matter, voice* shape, and 
language) and the student's response to the pieee of literature (spo- 
ken* written* and nonverbal). The five types of measures eomprising 
the evaluation program, placed on the response grid where the 
authors deem appropriate^ assess the program's objectives: (1) At- 
titude Scales help in evaluating what students value in literature and 
how they describe their responses; (2) Diagnostic Tests of Specific 
Skills (grades 7*12X focusing on seven areas of literary under- 
standing (e.g.* perceiving eharacter traits, pereeiving tone and 



ERIC 134 



LITERATURE TESTS 137 

V 



mood), indicate student abilities in interpreting and discriminating; 
D) Pretests and (4) Growth Tests (both grades 7-12), each containing 
thirty-six multiple-choice items on literature excerpts, hopefully un- 
familiar to the student, assess student abilities ik interpreting, relat- 
ing, and discriminating; and (5) Teacher-made Questionnaires, in- 
cluding inventories In student Inteiest and class climate, supplement 
other instruments and provide information not assessable elsewhere. 
The authors indicate openly the f,reas the Evaluation Sequence does 
not measure. A Guide to Evaluation is the evaluator^s handbook, 
containing not only pertinent information on uses of the various in- 
struments, but also samples for teacherdesigned instruments and 
helpful essays on literary criticism, the nature of response, and 
devices for evaluating response. 

Evaluation 

One can only be impressed by the thoroughness of the Cooper- 
Purves evaluation program, which attempts to assess, in a varied of 
ways, both the emotional and intellectual responses of students. The 
Evaluation Sequence is tightly, logically generated, first from five 
stated assumptions about literary response, second from a response 
grid, and third from a series of general and specific, verifiable objec^ 
tives. In fact, one might argue that>l Guide to Evaluation is a text- 
book on student response, and, herein may be the program's prin- 
cipal problem. 

As with many theoretical documents which supply specific ex* 
amples to illustrate theory, teachers may **snatch up" th s or that 
^ftithout understanding the underlying assumptions of student re* 
sponse, qltimatety defeating the purpose of the theory. The authors 
devoutly and repeatedly caution teachers about the use of the var* 
ious instruments. For instance, they emphasize the diagnosis of 
growth and deemphasize grading. Yet, 1 wonder if many teachers 
will not assign letter grades to each evaluative experience. In some 
multiple choice tests, a fifth option permits the student to write in 
his or her own choice. Despite the authors* explanations, 1 wonder 
how many teachers will take the time to deal with variances from 
"the key," or, for that matter, to implement the ingeniously de- 
signed "attitude sort,** the "class climate inventory,*' and the forms 
dealing with observation and description of responses. In effect, 
teachers using Ginn*s Responding Series need a developmental, 
articulate inservice program to deal with the subtlety, richness, and 
complexity of transactive response. In other words, thereS beauty 
tn the CooperPurves program, but it*s not there for the mere taking. 



ERLC 



135 



3 



Problems and RccommeitdaUotts 

Alfred /A Grommon 



In Part Two of this rcporti reviews of many different standardized 
tebts have pointed to specific problems that English teachers face as 
thc> attempt to reconcile the demands of standardized tests with the 
attitudes and skills that are taught in their elassrooms. To broaden 
the pers wctivc, consider the four general kinds of problems detailed 
b> Henry S. Djcr in the 1971 speech cited earlien "The State Assess- 
ment Survey." 

One problem has to do with taek of commtinteation among van 
ious groups within a state that ma> be working tnde^>endently on 

trying to devise some sort of assessment program Thislaekof 

communication is likely to become a breeder of conHict and eon* 
ftisioni and the conflict and confusion threaten to neutralize the 
w'hole etTort. ... 

A seeond problem that is beginning to crop up in those situa- 
tions where statewide testing programs are under eonsideration 
has to do with the manner in which the results will be used in the 
allocation of state funds to local sehool distriets. . . . 

A third problem is the wetl known one of how to proteet the 
confidentiality of the information being gathered in the assess* 
ment process — especially when this information includes data 
supplied by pupils about the economic and social conditions of 
their families. The mere faet that sueh information is being gath* 
ercd at all — regardless of efforts to guard the anonymity of the 
children who supply it^tends to generate storms in parent groups 
and state legislatures. And these storms are often exacerbated by 
headlines.iu the local press. 

Finally, in connection with the efforts to formulate meaningful 
educational goals around which to build an assessment programi 
there is the perennial problem of eonfusion between ends and 



t4t 



136 



142 ALFRED H,CROMMON 



means, betweett process and product, between pupil performance 
objeetives, stafT performance objectives, and institutional pcrfor 
mance objectives, between management by objectives and man- 
agement by prescription Until we can find some better meth- 
ods than we now have ot getting people uneonfused in this matter 
of goals and objectives, such assessment programs as may even- 
tuate in the next few years are not likely to have mu^u substantive 
impact on the improvement of education in any state,' 

What Teachers Can Do 

The 1973 Educational Testing Service surveys reveal that in many 
states administrators and teachers already are well aware of the 
problems mentioned above and are striving to resolve or at least re- 
duce them. Nevertheless* concerned teachers, in the interest of being 
informed about the testing programs in their state and of improving 
communication among involved groups should obtain the following 
kinds of information; 

— copies of the state's laws in which educational accountability is 
mandated; 

— other materials show ing the schematic design of the state's sys- 
tem of educational accountability; 

— copies of the stated educational goals; 

— any documents containing results of appraisals of the state's 
program of testing and assessment; 

— information about ihe creation and use of criterion 'referenced 
te$ts; 

— reports of how the results of achievement tests are used as a 
basis for educational decisions for the state and for local school 
districts; 

— information about what the state is doing to help local school 
districts develop their own programs of aceountabihty. 

These kinds of information should be available in every state. The 
better informed teachers are, the more effective their coiumunica- 
tion with other groups involved in the programs, including parents. 
The more significant their participation in the program, the less 
likely they are to play the role of being merely one of its agents, or its 
victims. 

Though problems in selecting and using standardized tests and 
then interpreting and disseminating results may be plentiful and 
perplexing, some of the encouraging trends in aspects of statewide 



ERLC 



137 



PROBLEMS AND RECOMMENDATIONS 143 



programi of testing identified earlier and adaptations of some of the 
following recommendations may help .teachers of English anticipate 
some difftcultieSf minimize the effects of others, and in general en- 
hance the status of their roles in the programs. As has been pointed 
out already^ failure in communication among participants and those 
affected by testing programs is a major local and national problem. 
Whatever else may be the focus of each of the following recommen- 
dations, each \h intended also to suggest or imply better communica- 
tion among persons concerned about the uses of tests. 

Recommendation One: English teachers should participate in deci- 
sions about testing. 

Engiiih teachers should participate in whatever groups are appoint- 
ed to make education decisions about statewide and local programs 
of educational testing. They should be involved in identifying goals, 
in selecting and creating tests, in interpreting test results, in placing 
those results in the context of the entire English program, in further- 
ing communication among various groups — state education officers 
and agencies, local administrators, teachers, students, parents, news 
media, other lay citizens — aiid in disseminating test results and in- 
terpretations to educational authorities, students, parents^ and the 
local public in general. 

Recommendation Two: English teachers should publicize profes^ 
slonal standards* 

In contributing to planning sessions involving programs of educa- 
tional accountability, English teachers should bring to the attention 
of all involved the 1971 resolutions on educational accountability 
and behavioral objectives that were issued by the National Council of 
Teachers of English. These resolutions testify to English teachers' 
recognition of their responsibility to be accountable to students, col- 
leagues* parents, the local community, and to the wider community. 
But the resolution on accountability also emphasizes that each of 
these other groups has to be held accountable, in turn, to teachers. 

Recommendation Three: English teachers must help to Interpret 
test validity. 

There is often an assumption, sometimes faulty* that the results of 
students' performance on standardized tests are necessarily directly 



t44 ALFRED H.GROMMON 



related to the qualit} of instruction they received on the particular 
subject matter of the test. As \^as reported earlier, the most common 
purpose in sta^tewidc programs of testing and assessment is to use 
test results to evahiate programs and instruetioiL ConsL^uentl>, En- 
glish teachers Unci themselves enmeshed in this kind of applieation 
of standardized test results. In siieh a case, teaeliers may be able to 
drau upon some of the following materiiils in interpreting test re- 
sults for outside agencies and the publie. 

Chauncej and Dobbins of the Educational Testing Service raise 
the basie question: **Sliould tests be used to assess teachers?*' In 
discussing this ijue:>tion, thej point out some eomplieations and im* 
plications of using the results of pupils* performances on standard- 
ized achieveniej I tests for the pnrpose of judging the effectiveness 
of their teaehers* They state, in part, that: 

Administrators, either on their ovtn or at the insistence of par- 
ents dnd school board members, all too often judge the quality of 
J tcdLher's instruction by the average seores earned by the teach- 
er's students on a standardized test! This can be far more danger* 
ous than even the m<fet knowledgeable advocates of educational 
measurement are likely to know. The danger lies in the fact that it 
is so easy to accept test results as the only evidence of teaching 
qualit>— when at their best, tests can yield only a small part of the 
evidence necessary to make a sound judgment 

The same considerations as those used with regard to judging 
the effectiveness of school s>stems must be made in assessing the 
ijidividual teacher. Do the tests measure an important part of 
what the teacher is trying to teaeh? Does the* teacher recognize 
that tliey do? Is it known exaetly what kinds of pupils the teacher 
has to teach, his *'ra^^ material**? Are there provisions for before- 
and after assessment, so that his cfTcctiveness will be judged by 
the changes he produces? Is the teacher a member of the assess* 
ment team, rather than its victim? Unless these critical questions 
can all be answered in the artlrmative, the teaeher of bright and 
academically favored students will be far more cficctive than the 
teachet of the less favored children, whose very real aehievemeiits 
will not be evident,' 

They go on to say later that '^standardized tests of student achieve* 
ment are snch useful teaching tools that it is often a mistake to try to 
make them do double duty as measures of the teaeher as wjcII," 

Further complications of any relation between test outcomes and 
a teacher's eflectiveness in vvorking with **raw materiar" arepoijited 



ErJc J 39'*' 



PROBLEMS AND RECOMMENDATIONS H5 



out by Ned A. Flanders in his article^ *'Thc Changing Base of Perfor- 
mance-Based Testing/* He says that the eurrent problem in measur- 
ing educational outcomes is that ''most of the tests used to measure 
student learning appear to be insensitive to differenees in teaching 
behavior.** One might add they seem to be insensitive also to differ- 
enees in pupils* cognitive styles. Later in his article^ Flanders states: 

One difficulty with measures of learning outcomes has been an 
over-cniphiisis on the subject matter achievement of students. 
There are t^vo aspeets of this problem. First* using a test of sub- 
ject matter as the onl> criterion of learning is inadequate, beeause 
student learning includes mueh more. For example* one might 
nominate staying in sehool and not dropping out; learning to like 
sehooling, the process of learning, and the teaeher, in contrast 
with hating them; graduallj learning how to be niore self-direct- 
ing and independent; learning how to make moral and ethieal 
judgments. Any of these may be more important measures of 
teaehiag than are scores on reading* writing* and arithmetic. 
SecotnL given the common foeus on a subject matter ond a re- 
searelt tlv^lgn coitshting of a pivteu* teaching/ learnings and post- 
iest, it was soon discovered tlmt pusttest itcltiercment is at least 10 
times more strongly associaied with pretest scores than it is m'th 
any measure of teaching. . . * Another problem is that standard- 
ized achievement tests are designed to be insensitive to the influ- 
ence oj a pitrtictilar teacher and reflect, instatd, the total develop- 
ment background ofthesttident. ... In spite of these difficulties* 
it is possible to analyze teachiug effectiveness* but it will require 
some rethinking* some innovations, aad some retooling with re- 
speet to the eriterion measure.* 

Lee J. Croubaeh said something partieularly relevant here: "Dif- 
ferent ehildren learn different things from training.**"^ 

Teachers fmding themselves evaluated on the basis of their pupils* 
performances on standardized tests in English may \vell HikI, in the 
preceding statements, leads for helping others see the limitations 
and dangers of drawing such inferenees from test results* Sueh 
teaehers may identity also sueh evidenee as Flanders points out 
about teaehers* and sehools' influenees upon perhaps more impor- 
tant outeomes of pupils* learning. 

One aeeount of a teaeher*s experiences in these matters may toueh 
directly som*^ experiences of English teaehers and may evoke a wry 
smile of recognition, In the researeh leading to his report, /?mt/iV(g 



140. 



146 ALFRED H.CROMMON 



ikv Futun\ Edmuiici J. Fiirroll received the Ibllowing letter from an 
Englisli teacher on his panel ortnlbrmants: 

M) own research has convinced me that red^inking errors in stu- 
dents* papers docs no good and causes a great many students to 
hate and fear virittng moa' than anything else they do in school. I 
gave a long series of tests covering 580 of the most common and 
persistent errors in usage, didioiit and punctuation and 1,000 
spelling errors to students in grades 9*12 in many schools* and the 
average rate of improvement in ability to detect these errors 
turned out to be 2 per cent per year. The dropout rate is more 
then enough to account for this much improvement if the teachers 
had not even been there. When I consider how many hours of my 
life I have wasted in trying to root out these errors by a method 
that clearly did not work, I want to kick myself. Any rat that per* 
sisted in pressing the wrong lever 10*000 times would bo regarded 
as stupid. I must have gone on pressing it at least 20,000 times 
without visible effeet* 

The number of teachers of English pushing wrong levers is probably 
incalculable. 

On the relation of test data to caliber of instruction* a statement 
quoted earlier from the ETS survey of statewide a.ssessnients seems 
appropriate here too. 'Mt is probably safe to say that statewide as* 
scssincnt will not jsroduce any startling revelations about what can 
be done by teachers with pupils to help children learn more effcetive* 
ly/* Though standardized test.s* appropriately selected and u.sed* can 
be helpful aids in teaching and evaluation* any attempt to use the re* 
suits of sueh tests* as single measures* as means also for assessing 
teachers can be seriously misleading. 

Recommendation Four: English teachers should demand an appro* 
prlate relationship between standardized tests and the purposes of 
the entire English program* 

Teachers of English should insist that whatever .standardized tests 
may be used are* at the outset, appropriately related to the purposes 
and nature of their entire English program. They should strive also 
to make sure the publie, educational authorities, and the news 
media see that relationship clearly^ 

A memorandum, "Spceitieations for Evaluation Strategy*" pre* 
pared by teachers of English in the Bellevuc, Washington* Publie 



141 



erJc 



PROBLEMS AND RECOMMENDATIONS 147 



Schools illustrates what the teachers did to present the essentials of 
their English program and, at the same time, provide a guide for 
test-development agencies interested in responding to the **need for 
an instrument and strategies to evaluate the program of English 
Language Arts and Skills in the Bellevue Public Schools." The 
memorandum included the following statements: 

K Purposes for an Evaluation Program 

2. Specifications for an Evaluative Instrument and Strategies 

3. Assumptions about the English Program 

4. Experiences ('*The Developmental Expectations") provided for 
students in our program, K - J2 

5. Short Term Outcomes that students might be expected to dem- 
onstrate after the experience-expectations have been provided* 

■ /■ 

It would appear that any evaluative instrument designed to fulfill 
these detailed specifications would have to be in keeping with the ex- 
plicic purposes of that English program: its objectives, content, 
skills, and cognitive and affective experiences provided for students 
in Bcllcvue English classes. Such a product created to meet these re- 
quirements would be different indeed from commercially prepared, 
norm-referenced standardized English tests readily available from 
publishers and intended to be usable in any school, no matter what 
the special features of that community or its English program. 

Another example of a statewide project designed to help teachers 
of English throughout Wisconsin is reported in the pamphlet. 
Evaluation of Pt^blished English Tests, prepared as a "guide for ad- 
ministrators, supervisors and teachers of English in the selection 
and use of standardized tests.'* The pamphlet presents evaluations 
of sixteen commercially prepared tests that seemed in 1966 to be the 
most frequently used by English teachers throughout Wisconsin to 
as:vOss pupils' skills in aspects of spelling, vocabulary, sentence 
structure, awareness of elements of grammar, and conventions of 
written English/ In response to questionnaires, teachers who were 
experienced in using particular tests in their classes wrote evalua- 
tions of these tests. To the summaries of teachers' evaluations were 
added :vummarie.s of reviews published in The Fifth Mental Meas- 
nrentents Yearbook and The Sixth Mental Measurements Year 
hook,^ and also the occasional review in a professional journal, m 
the conclusions and recommendations* Wood lists questions teach- 
ers and school districts ought to answer satisfactorily before select- 
ing and using a standardized test: 



i:RLC 



I4i^ ALFRED H.GROMMON 



1. What portions of the content of English at the grade Icvcir. to 
be tested are ineluded in this test? 

2. h this proportionate emphasis parallel to the emphasis given 
by our teaehers? 

3. Does this test measure what our teaehers eonsider to be a 
basie part of their eiirrieuliim? In other words, docs it truly 
test oureurrieutiini? 

4. Are the presented items valid? For example* are the items of 
usage, puiietuation, sentenee eorreetions, and, other details 
eonsistent with what we teaeh? \ 

5. What is the time required for this test? 

6- How easy is it to administer? Arc the direelions simple and 
el ear? 

7. How easily may the test be seored? 

8. What do the seores mean when eompletcd? 

9. How arc the norms deri\ed? How extensive was the sampling? 
fO. How can the results of this test be followed up for the im* 

proveineiit of the English program?* 

Today, any teaeher of English or sehool district about to engage in a 
testing program undoubtedly would ask additional questions repre- 
senting priorities apparently not raised by Wisconsin English teaeh- 
ers during the 1960s. Nevertheless^ the questions are sensible, praeti* 
caK useful, and still highly a-levaiit to any consideration of standard* 
ixed tests in English. 

Still another example of a statewide projeet is the use of the 
Ettglisk Lititgiiag^ Fntmiwork far Califontia Public Schools. Kiu* 
ihr^itrii'ti through Gra<k Twvhv (California State Department of 
Edueation, I968)during 1972-74 as the basis for the preparation of 
guidelines for designing English tests to be used in eomplianee with 
a state law. This Framework was prepared by the California Ad* 
visory Committee for an English Framework eoniprised of teaehers, 
supervisor^, and eollege and university professors of English and 
Edueation working in conjunction with thousands of English teaeh* 
ers and supervisors throughout the state. The Framework was 
dduplwl by the State Department of Education in l%8 and again in 
Wl. During 1974, it was revised by a statewide eommittee of teaeh* 
ers of English and of English educators. Long though this process 
has been, the Advisory Committee, nevertheless, thereby aehieved a 
high degree of communication statewide among English teaehers, 
administrators, college and university professors, and the State De- 
partment of Education. Testimony to these benefits appeared in 
subsequent developments. 



ERIC 



143 ' 



PROBLEMS AND RECOMMENDATIONS 149 



In 1972, the California State Legislature, insisting upon holding 
public schools aecountable to the state and publie, adopted a law- 
stipulating that in 1974-1975 a eontinuing program of statewide 
testing on the basis of matrix sampling would be instituted in the 
sixth and twelfth grades. This program was to measure the effective- 
ness of pupils' written expression and their abilities in spelling. Asa 
crucial first step in involving English teachers and sehool districts in 
helping ihe State Department of Education evolve the best possible 
English testi* to be administered in compliance with this law^ the 
State Offiec of Program Evaluation and Research invited represen- 
tative school and college teachers of English to serve as an English 
Language Assessment Advisory Committee. 

The Advisory Committee worked twoycars in preparing its report, 
Gttiitdines for Dcsii/ftiing Tests to be Used toA:iScss 6th- and 12th* 
Gruilc Studetit:^* Cvmpctoicies hi Written Expressiotratttl SfteUing. 
The opening paragraphs illustrate the intent of the Advisory Com* 
mittec to ensure that whatever tests were developed must be in keep* 
ing with the state Framework and the principles underlying the en- 
tire guidelines: 

Any California program of Siatc-widc assessment of public 
school pupils' competencies related to their command of the En- 
glish language should be founded upon certain principles pre- 
sented in three documents: 

1. the English Language Framework for Califonna Pnbiic 
Schoob (adopted in 1968; readoptcd in 1971); 

2. the "Criteria for Evaluating Instructional Materials for En- 
glish and Related Studies, K-8/* prepared by the English 
Advisory Gronpof the California State Curriculum Commis- 
sion, published in X\\^ GATE B idle titu Spring, 1971; 

3. recommendations by the English Language Assessment Ad- 
visory committee (1972-73). 

The following prineiples, drawn from the above sourecs, are in- 
tended to ^erve as guidelines for the California State Department 
of Edueaiion Office of Program Evaluation and Research, for any 
publishers of standardized tests interested in this State-wide pro- 
gram in evaluation, and for representatives of the schools that will 
be using tests developed to fnlfill stipulations of the Greene Bill 
and will be diseussing this program of evaluation in their com- 
munities. 

Although the tests to be administered in eompliance with the 
Greene Bill »ill be designed to assess pupils* competence in using 
the English tajtgiiage effectively and in spelling, these separate 



150 ALFRED H.CROMMON 



tests should also rcllcct the dcvclopm" a\iarcncw of the definition 
of English as a school subject and of the principle ofunit}^ and se- 
quence of English programs as recommended throughout the 
English Lattguagje Framework. The definition of English is im- 
plied throughout Framework^ but the following is a direct 
statement of purposes of such a program: 

The chief aims of the school program in English are. as the 
Curriculum Commission has stated, to develop in all children 
who graduate from the twelfth grade . . . competence in hsten- 
jng. speaking, reading, and writing English and as much ap- 
preciation and understanding as possible of the literature of 
America, England, and theworid. 

To reinforce the Advisory Committee's stand that any English tests 
used in compliance with this law must be consistent \iith XhQEttgiish 
Langujgv Framework* the delegates to the annual meeting of the 
California Association of Teachers of English, in February ]973> 
passed a absolution calling for an amendment to the State Education 
Code to make sure that the law speeified this relationship: *\ , . to 
require that slate\iide tests in English (language," literature, reading, 
and \iriting) reflect accurately the principlesj^et forth in thcEttgh'sh 
Language Framework for CaHforniii Pnhlic Schools. . , This reso- 
lution was then forwarded to the state legislature. 

In May 1974, the State Department of Education sent to pubhsh* 
ers the GuUvh'nvs* including objectives, specifications, and sample 
test items. In the same month, the document was sent to 200 Califor- 
nia school districts for their review and comment. In addition, as a 
means ol' continuing and improving communication, the state office 
periodically prepares and distributes a leaflet entitled "FEED- 
BACK, Newsletter of the New Cahfornia English Testing Program." 

The above was an example of how one state department of educa- 
tion drew upon man> English teachers throughout the state to make 
sure the tests used were of the best possible kind. The California 
stor> is detailed here, and in Recommendation Seven, in order to 
suggest to English teachers and administrators in other states what 
may be done to improve the quality of statewide testing. In sum* 
mar>, it should be pointed out again that through the initiative of 
the California State Department of Education in establishing an Ad- 
visor) Committee and through the full, congenial cooperation of its 
representative working with the group, the Committee was able to 
draw to the attention of the State Department, the school districts 
throughout the state, and the interested publishers of tests the neces- 



145 



PROBLEMS AND RECOMMENDATIONS 151 



sity of placing any statewide testing in English clearly within the eon- 
text of the whole program of English as represented in the English 
Lauguage Framework for California Public Schools. 

Recomtnendatton Five: EngUsh leaehers should insure the eonfi- 
denOalUy of test results. 

As has been indicated earlier, reporting to the public the results of 
students* performances on standardized tests may sometimes pose 
problems of confidentiality, particularly those that may encroaeh 
upon the privacy of the individual pupil and perhaps adversely affect 
a sense of self-worth. Most statewide programs are designed to 
yield information ab( Jt the quality of pupils' performances on the 
basis of results for a sehooL district, or state. Tests are intended to 
help evaluate programs, not individuals. To minimize these difficul- 
ties and, in some ways* simplify statewide testing* several states have 
designed programs of testfng on the basis of matrix sampling, a pat^ 
tern ensuring that no pupil gets the entire test. Rathen eaeh gets a 
certain sampling of all items comprising the entire test. The var- 
ious samplings distributed in a school or district, however* do consti* 
tute the entire test. As a result of sueh a sampling pattern, tKe level 
of performance of pupils as a group, in a program or district, can be 
mea.sured; however, theevideiiee of how" an individual pupil might 
perform on the entire test is not available. The use of some pattern of 
sampling is one of the recommendations made by the Michigan pan- 
el of educators who assessed that staters program of aecountability. 
Yet in other kinds of local testing programs in which teachers wish 
to know how well individual pupils are performing on a test of par- 
ticular aspects of an English program, they ean seleet or create a dif- 
ferent kind of measurement. The feedback of test results can enable 
teachers to help the individual pupil and to re-examine the English 
program. Whatever the nature of the instruments, of theadministra* 
tion of the tests, and of the handling of results, all must be treated 
with care to protect the individual pupil's sense of self'worth. 

Recommendation Six; En^ish teaehers should maintain visBance 
of test validity. 

Teachers should continue to examine what they eonsider to be the 
validity of the content of standardized English tests already a part of 
a continuing program of assessment and of other tests available to 
them for their own purposes. 



14S 



152 ALFRED H.GROMMON 



The following guidelines developed by the aforementioned Cali- 
fornia English Language Assessment Advisory Committee may help 
other English teaehers in evaluating both the eontent and format of 
standardized English tests they are now using or are eonstdering for 
possible use: 

The Committee also reeommends the following guidelines rep- 
resenting speeifie appUeation of the preeeding prtneiples as fur- 
ther aids for whoever will be responsible for developing a valid test 
of eompeteneies in the uses of English in aeeordanee with the 
Greene Bill: 

K Test items should refleet an awareness that a ehild's initial 
development of language eompetenee is in the dialeet or lan- 
guage aequired within his linguistie environment. Although 
it may not be possible to eonstruet CM//Kre,/retf tests, eonsid- 
erable effort should be devoted to developing tests that are 
culturally fair. 

2. Questions requiring pupils todiseriminate among ehoiees of 
usage and dietion should be based upon the ertterta of what 
is appropriate to a speaker, his audteneei and the situation 
in whieh the usage is to be uttered rather than upon any tra- 
ditional eoneept of so-ealled "eorrcetness" in the use of the 
English language in the abstraet. 

3. Aeeordingly, questions related to usage should speeify the 
speaker and the situation in whieh these items of usage.are 
to beeonsidered, thereby indieating the relationship between 
the speaken the audienee, and the eontext in whieh the in- 
dividual is speaking. 

4. Test items related to questions of English usage should also 
refleet the prinetple that informal English is appropriate in 
many eontexts. 

5. Questions related to appropriateness of usage and dietion 
should be intended further toeneourage pupils and teaehers 
to aeeept the natural and simple use of the English language 
rather than the pedantie and the awkward. 

6. Settings in whieh the language items are being used and the 
items in questions should be appropriate to the maturity lev- 
els and interests cf students being tested. For example, items 
testing pupils* vocabulary and eommand of syntax should be 
appropriate to the grade level being tested. 

7. Many items shouW be written at a level of diffieulty in read- 
, ing below that at whieh students are being tested. 



147 



PROBLEMS AND RECOMMENDATIONS 153 



8* Items should rcpr<^scnt the range of practical and academic 
applications of the use of the English language that students 
face, 

9* Items should not include specific grammatical terminology 
unless the meaning of such terms is easily discernible in the 
context of the item or is directly defined. 

10, The basic recomm<;ndation on the format of any test is that 
subtests should be designated as single sample with all test 
items derived from the example. All items within a subtest 
should be related to a sample situation or ^tting, thereby 
maintaining the unified or holistic nature of the language 
arts. Meaning is frequently derived more from context than 
from single words, phrases^ and sentences; to test for any 
concept of effective written expression out of context can 
seem confusing and unreal The sample itself may be a let- 
ter, paragraph, editorial, news story, or even dictionary entry 
So long as it provides a range of possible questions on effec- 
tive expression and is in itself free of gross errors. 

I K Grouping and Format of Items; 

In each subtest similar test items should be arranged togeth- 
er so that the student taking the test can focus on sentence 
patterns, sentence manipulation, punctuation, diction* and 
all other test, elements one at a time. The physical layout of 
the single sample ideally would have the sample presented 
once as nn uninterrupted piece; for examination purposes 
the sample could appear as the left-hand column and the 
t<^st items as the right-hand column on the same page. Items 
may be of the true-false or multiple choice varieties with 
preference for the latter. 

Almost all standardized tests related fo pupils* command of the 
English language include items based upon the testmakers* concept 
of what is acceptable usage and diction. At least seven of the above 
guidelines alert testmakers and teachers to the complexities and 
subtleties of language usage and to the difltcultyof trymg to test this 
ph<^nom<^non. The principles in the guidelines may help in apprais- 
ing th<; eont<;nt and format of standardized tests or may suggest 
oth<;r featur<;s clos<;ly related to the teacher's individual circum- 
stance* 

The following resolution, Sutdeitts* Right to Their Owit Lait- 
adopted by the Executive Committee of the NCTE Con- 
. "ence on College Composition and Communication in March 



ERIC 



148 



154 ALFRED a CROMMON 



i972> and approved at the official business meeting of theCCCC in 
April I974> may serve also to remind teachers of the desirability of 
having an open attitude toward language usage, particularly that of 
pupils, and may ofTer them a reinforced criterion forjudging wheth> 
eror not the current status of language arts is adequately reflected in 
standardized English tests: 

We affirm the students* right to their own patterns and varieties 
of language — the dialects of their nurture or whatever dialects in 
which they find their owti identity and style. Language scholars 
long ago denied that the myth of a standard American dialect has 
any validity. The claim that any one dialect is unacceptable 
amounts to an attempt of one social group to exert its dominance 
over another Such a claim leads to false advice for speakers and 
writers, and immoral advice for humans. A nation proud of its di- 
verse heritage and its cultural and raeial variety will preserve its 
heritage of dialects. We affirm strongly that teachers must have 
the experiences and training that will enable them to respect di- 
versity and uphold the right of students to their own language.'^ 

Although such a position statement may be unsettling to some 
teachers, it should incline thcm> nevertheless, to reconsider carefully 
the nature of their own attitudes toward diversity in language. They 
might examine also some implications not only for their treatment of 
language in their classes and tn their relationships with individual 
pupils but also for the nature of the standardized English tests that 
th^ favor. 

Recommcndntlon Seven; English teachers should seek the support 
of professional associations* 

Teachers dissatisfied with English tests now in use in their schools 
need not despair in silence. The NCFE and its affiliates often pass 
resolutions stating teachers' strong convictions about kinds and uses 
of tests. Some of these resolutions are quoted or referred to in this 
diseu<;sion. The continuing account of activities in California illus- 
trates what English teachers can do about these problems. In 1971 
the California Association of Teachers of English (CATE), a state af- 
filiate iWCTEt approved a resolution growing out of widespread 
disapproval of the statewide use of a standardized test selected by 
the State Board of Education* The test was used to comply with re- 
quirements established by the California State Legislature in 1968> 
whereby achievement testing in basic skills courses became manda- 



ERIC 



149 



PROBLEMSANDROCOMMENDATIONS J55 



lory. Some meniberi of GATE made a detailed* highly erilieal analy- 
sis of (he English lesl (hen being used and published part.of Iheir re- 
port in the **CATE Currieulum Newsletter/* (Spring 1971). Some 
high ichuul English Departmettti also s^ke out against (he test and 
its use in California. These and other protests eulminated in the fol- 
lowing GATE resolution published in the same newsletter: ''RE- 
SOLVED That (he California Association of Teaehers of English 
urge (he California State Board of Edueation to abandon further use 
of the Iowa Tests ^f Edueational Development, Form Test 3, 
Correctness and Appropriateness of Expression^ and deelare the re^ 
suits of the tests for 1969 and 1970 as meaningless for the State of 
California." Whatever the eonsequenee of this aetion may have 
been, the California State Department of Edueation appointed, one 
year later, the English Language Assessment Advisory Committee 
that wrote ihc Giiuleliue^. Following the Cii/t/t'/nu'5, the State Offiee 
of Program Evaluation and Researeh took the initial steps in devel- 
oping a tailor-made test of the effeetiveness of written expression 
and spelling to be used lirst in 1974*75. 

Recommendation Eight; English teaehers should consider creating 
tailor- made tests. 

As already mentioned, theETS surveys of statewide programs report 
that several states are using, and others are in the process of creat- 
ing, tests tailored to fit their edueational goals, programs^ and other 
eireum^tanees. Many tailor-made tests also are designed to help pu- 
pils and teaehers identic some noneogniti\e elements of their educa- 
tional experienees. As indicated earlier, an increasing number of 
states are, or will be, using eriterion-refereneed test instruments. For 
example, in the ETS survey of assessment (p. 42). Michigan reported 
that lib State Department of Education was coordinating a project 
mvolving cooperation between the California Test Bureau-MeGraw- 
Hill, Inc., and four local Michigan school districts in devdoping cri- 
terion-referenced tests based upon state speeifieations. These tests 
were to be used in the 1973 administration of the assessment pro- 
gram.** 

Because mosl, if not all, teachers who create their own class tests 
are already designing some form of criterion*relereneed examina^ 
tlons, they may wish to consult some aids on the development and 
use of sueh tests. One helpful reference is Robert B. Carruthers' 
pamphlet, ^'Building Better English Tests, A Guide for Teaehers of 
English in Secondary Schools/*'* In it, Carruthers analyzes and il- 



150 



156 ALFREDH.GROMMON 



lustrates rundamenut topics sutii as planning the test, basic charac- 
terijtics of effective tests :nd test questions, selecting proper test 
questions, building effeetive short-ansvver items and essay questions, 
and reviewing the resul/5 of the test 

A eriterion-referene^id test offers the advantage of being suited to 
local Lireumstan^es and limited to a small segment of a course or 
program. But problems inhe:e also in judging the results of a pupirs 
peiTormanev. For instance, ho^^ many correct answers must a pupil 
make before a particular objective is considtred achieved? If the 
te£.t is appV.cd to a progn'^m rather than to a unit in a particular 
cou»*se, at ^hat grajc level is he or she expected to be able to meet 
partitt:!ar t;bjrctivcs? Do the objectives apply to alj pupils in that 
grafle?'' Etesigijcrs and loe.il users ofcriterion-rcferenccd tests have 
to (msv.ct these and other questions in making educational decisions 
involving the selection and use of tests and their results. 

In consider;r»g uses of tailor-made and standardized tests, English 
teachers may wish to give special attention to the use of tests as a 
means of exploring pupils' experiences with literature. As will be 
seen in tlic reviews of standardized tests of literature presented ear^ 
lier, the tests m.iy place a premium upon a pupil's memorized 
knowledge of aspects of literature rather than upon ascertaining the 
nature and range of the students' responses to a literary selection or 
to a variet) of selections in a unit. In other words, how can a teacher 
attempt to ascertain some aspects of a student's affective exper- 
iences with literature? Some help on these subtle but basic qualities 
in a reader's responses may be found in the CarruthersVpamphlet; 
however, more extensive information may be obtained from Purves" 
and Beach's Liiertiiure und the Reailer. Research in Response to 
Literature^ Reading Interests, and ilie Teacliing o/Ltteratnre.** 

A list prepared by the Dallas, Texas, Independent School System 
could serve as an additional guide for teaehei^ and schools inter- 
ested in creating and using criterion-referenced test.s.'^ The follow* 
ing is this author's version of the list, edited to suit the purposes 
here: 

1. A criterion*referenecd test (CRT) evaluates what a student 
knows or does not know. The student is evaluated against 
the objective, not against national norms or the achieve* 
ments of other students. 

2. CRTs arc based upon a set of specified instructional objec- 
tives which d' .cribe the developmental instruction program. 
These objectively identify the act, define the conditions undci 

J 5.1. , 



PROBLEMS AND RECOMMENDATIONS IS7 



which it is to occur, and often describe the standard of ac- 
ceptable performance. 

3. The use of criterion-refereticed tests should tend to make 
educational objectives apparent and provide information 
about what the individual student can oreannot do. 

4. Metisuremcnt of achievement can be defmed as the assess^ 
ment of terminal criterion behavior: a student*s perfor- 
mance with respect to specified standards. 

5. Achievement measurement is directed toward a studenfs 
present performance; whereas, aptitude is related to both 
present performance and prospects of future attainment. 

6. Minimum levels of performance need to be specified in be^ 
havioral terms and to describe the least amount of compe^ 
tencc the student is expected to attain at the end of the in^ 
struction-learning. Information about conditions or instruc- 
tional treatments also ean be provided. 

7. Criterion-referenced tests are developed to determine the re^ 
lationship betA\een a student*s performance and objectives of 
teaching-learning. Because the tests are not normed, the 
usual reporting of results in reference to a population based 
upon a "normal curve" is not relevant. Results are usually 
reported instead in terms of the percentage of students mas^ 
tering the objective. 

8. Criteria for "acceptable*" performance can be established by 
comparing the performance of students with those of other 
stiidencs and bj developing absolute standards ofexcelfcnce. 

9. Pre-testing is important because the criterion of behavior at 
the end of instruction alone doe^; not dictate methods of 
teachings but differences between a student's behavior be- 
fore and after instruction do* 

10. CRT's are devLsed to help make decisions about individuals 
and educational treatments. 

11. Variability in performances i.s irrelevant in analyzing results 
of a CRT. The significance of a performance is not depen- 
dent upon comparisons with scores made by others. 

12. The chief factor in the construction of a CRT is that each 
item is an accurate reflection of a criterion behavior. 

13. A CRT may result in each participant's getting a perfect 
score. Thus, the typical index of internal eonsiste^.^y (split 
halO is not api>ropriate for a CRT. 

14. Procedures for examining validity of content are more suited 
to CRTs. 



ERIC 



152 



158 ALFRED H.GROMMON 



15. Ill a CR1\ an item that doesn't discriminate among rc^ 
sponses need not be eliminated from the test if it retkcts an 
important attribute of the criterion. 

16. When decisions are to be made on a lumiber of indLvidiials 
and comparisons of individnals are necessary, NRT*s are 
used. But when indi\kluali/ed instruction increases and in^ 
formation about compcteneies an individual has or does not 
have is needed, CRTs are used. 

17. In constructing a CRT, one should consider the extent of the 
disciplme to be measured, the number of objectives to be 
covered, the number of items for each objeutive, the method 
of scoring, thu* amount of time required for adniini,stering 
the test, and the manner in which test results are to be regis^ 
tered. A CRT nnist have relevance, objectivity, and spccili- 
city. Indices of difftculty and diserimtnation should be 
homogeneous. 

Teachers in states not ah"ead> using criterion-referenced tests may 
wish to obtain samples of some now available elsewliere.'^ These 
ma\ prove to be useful models, a guide for teachers interested in 
fashioning their own versions. The models offered bj Carruthen* 
ma> help English teachers improve liieir measurement of their stu- 
dents' .ichtevement at the end of a unit or course. Such ventures 
might be espeeiallj worthwhile for those who luid little connection 
between their students* ulassroom achievement and the results of 
standardised, norm-referenced tests. 

Recommendation Nine: English teachers should be sure tests are 
administered properly. 

The ETS surveys report that in almost <ill instances it is the class- 
room teacher \\ ho administers tests to students, ^:o doubt thestates 
provide each teacher with explicit instructions on exactly how the 
tests are to be administered. Nevertheless, the checklist below, from 
the Dallas, Texas, Independent School System^ offers helpful re*^ 
minders to any teacher administering any sort of test:'^ 

Yes No 

1. Were the students allowed to use the restrooms 

before testing? 

2. Was there a testing sign on the room door? 

3. Were any students excluded from testing before 

or during the test by the teacher? 



PROBLEMS AND RECOMMENDATiONS 



4, Did any of the students refuse to take the 
test? 

5, Was there asuffieient amount of testing 
materials and pencils? 

6, Were,the appropriate marking techniques 
demonstrated to the students? 

7, Were the students shown how to cross out a 
mark completely if they wished to change 
their answer to a particular question? 

8, Did the administrator attempt to create a 
relaxed atmosphere indieating that he 
expected each student to do his best? 

9, Were sample items explained^ if there were any 
sample items? 

10, During the demonstration of a sample item, 
was the test booklet held on a level with the 
students' eyes as they were seated? 

IK Were all the students able to see the teacher 
from where they were working? 

1 2, Were seating arrangements made to discourage 
cheating? 

13, After the explanation of the sample item, were 
the students asked if they had any questions? 

14, Were the directions repeated for any of the 
sample items? 

1 5, Were the directions repeated for any of the 
scored items? 

16, Were directions given in a clear, natural, and 
pleasant voice?. 

17, Were the test instructions read verbatim from 
the test manual? 

18, Were any personal assistance or hints given to 
any of the students on test items? 

19, Were the students constantly checked to ensure 
they were working on the correct page and 
item? 

20, Were markers used by the students to help 
them keep their place? 

21, Were there any interruptions during the 
administration of the test? If yes, how many? 

22, Were the testingsessions scheduled as 
recommended by the test manual? 



154 



160 ALFRED H.CROMMON 



23. Were any of the time limits set for individual 

tests or items altered? — 

24. Was a brief rest period provided between two 

tests or test sessions? 

25. Was the classroom atmosphere relaxed during 

testing? — — 

26. Were students talking during the testing 

session? — — 

27. Was the testing stopped at the appropriate 

time? _ _ 

28. Did the teacher seem to have eontrol over the 

entire testing situation? 

29. Were irregularities recorded and sent with test 

materials for scoring? — 

30. Were ctTorts made to insure that all students 
understood the instructions? — 

31. Did.the test administrator paraphrase or 
present anything new to the standardi'zed 
instruetions? „ 

32- If questions were asked relative to**guessing/' 
did the administrator reread the instructions 
which refer to guessing or inform studejtts to 
*'do the best you can" if there were not instruc- 
tions on guessing? — — 

.33. If the administrator used a timer, did he keep 
up with the regular time and mark the regular 
time down as a baekup device? — 

34. Was overtesting avoided during a single day to 

avoid anxiety and boredom? 

Recommendation Ten; English teachers shouid be sure tests are not 
dehumanizing. 

A final reminder here is that all tests are designed for and adminis- 
tered to human beings* Permeating English teachers* comments 
about standardised tests is their concern lest the use of such instru- 
ments and the corrollary antecedent — quantifiable behavioral ob* 
jectives — should lead to dehumanizing of English as a school subject 
and to giving pupils the impression of being treated mechanistically. 
Undoubtedly, all English teachers consider themselves guardians of 
the humanistic tradition In life, in general, and in education, In par- 
ticular. The uses of tests and this genuine uneasiness of English 



155 



PROBLEMS AND RECOMMENDATIONS 161 



teachers need not be antagonistic. Trends discussed earlier suggest a 
growing compatibilitj between some forms of testing and the larger 
goab ol'a sut)ject like English: notably, the establishing of statewide 
goah recognizing cognitive and noncognitive outcomes of education; 
the increased use of assessment to identify affective aspects of pu- 
pils' educational experiences; and increased use of criterion-refer- 
enced and other tests tailored to ftt school and nonschool circum- 
stances of an individual or of a particular group of pupils. 

The following list, also from the Dallas school system/* seems in 
keeping with English teachers' concern with the child as a person 
and offers wholesome reminders to anyone using any kind of test: 

1. Every pupil has a worth that is not measured by any test. 
X It is for the good of the student that tests are administered. 

3. Te5t results must be supplemented by factors such as the 
following: 

(1) sttiitoit fiiQton — motivation, aspirations, temporary and 
permanent health, home environment, and previous 
school environment; 

(2) school facton — ctirrictiltim, textbooks used, teaching 
material and supplies, general adequacy of school plant 
and equipment, type and extent of supervision, adminis- 
trative policies* and general harmony within school staff; 

(3) cottimumty fucton — type (urban, suburban, or rural)* 
poptHation (foreign or native, heterogeneous or homogc- 
neotis)t general level of culture, interest in educational 
matters* financial support of schools and cooperative ncss 
toward school administration. 

4. The norms ot the tests are not the goals for all children to 
reach. 

5. Realize that tests evaluate only part of the desired outcomes. 

6. 1 he test5 should not be used ibr invidious comparison among 
pupils and schools. 

7. ir a teacher berates or scolds a child because of poor perfor- 
mance on a test, she may be building up unfavorable attitudes 
toward future testing. 

8. Teacher failures that may affect testing and test scores are (a) 
failing to let a pupil know his results, (b) feeling insecure and 
threatened by test results, (c) being unsympathetic to the test- 
ing program and giving sarcastic references to it. 

9. Tlie teacher must reali/e that the testing activities contribute 
to the improvement of the child's learning. 



ERLC 



15S 



162 ALFRED H.CROMMON 



10. Foliowing an examination, it is proper and desirable to use 
test results in a class discussion of the items and interpreta- 
tion of the results. (Individual results should be discussed with 
the pupil concerned.) 

11. Do not disclose individual pupils' results in such a way as to 
permit comparisons among pupils. 

12. Guard against the temptation of viewing the students as col- 
lections of test scores. 

13. Review the student and/or class profile. 

14. Seek the helpofspecialists. 

15. Always consider that tests have errors. 

16. Students with low scores may need special instruction and 
specsfle skill drill. Students with high scores will need enrich- 
ment. 

17. IfaM other subtests are good and one is poor, it may indicate a 
lack of transfer of skills. Some skills are prerequisites to other 
skills. 

18. Percentiles are probably the safest and most informative 
seores to use when dealing with the publie. 

19. Again, it is important to reiterate that the test results must be 
integrated with informal observations by the teacher, stu- 
dent's interests, grades received, and actual performance of 
the student. 

20. Keep in mind that the only justification for the time and 
money spent on tests is that some beneficial educational re- 
sults will be gained. 

21. Test results should be used in association with other informa- 
tion concerning the student's background and environment. 

22. The graphic presentation of ii profile is one of the most help- 
ful ways to analyze test results. The profile helps to make 
inter-individual comparisons with others of a similar category 
and intra-individual comparisons with examinee's own scores. 

23. A single best occupational choice should not be inferred from 
the Oceupational Interest Inventory. 

24. Test results should be interpreted by competently trained pro- 
fessional personnel when they are to be used as a basis for de- 
cisions that are likely to have a major influence on the stu- 
dent*s future. 

25. The interpreter should constantly keep in mind what the test 
measures. 

26. Comparable scores on two tests may not give comparable 
meanings. 



157 



PROBLEMS AND RECOMMENDATIONS t63 



27. Consider how the score will affccl Ihc person reeeiving Ihe 
inrormation. 

28. Always eonsider that no tesl measures without error 

29. If available, local norms should be used in order to inake 
comparisons with one's peers. 

30. Individual eonfercnces should be used to a naly/e personal test 
results. Group conlcrences should be used to explain the test 
program and basic test information. 

31. Identify areas of strengths and weaknesses as revealed by 
average seores. 

32. Parents have the right to know whatever the school knows 
about the abilities, the performance and the problems of their 
children. 

33. Study guidelines Tor test interpretation provided by the Dis- 
trict and the test publisher. 

34. Hold meetings on test interpretation. 

35. Provide lor parental eonferences. 

3(>. Study all test results data carefully and use item analysis. Re- 
port for diagnostic and prescriptive purposes. 

37. Be mindful of the fact that the tests are estimates of the var- 
iables measured and that they have errors and limitations. 

38. Use the multiple sources available to supplement the interpre- 
tation of the test data. 

39. Secure professional assistance for test administration and in- 
terpretation when needed. 

Notes 

1. Henry S. Dyer. *'The State Assessment Survey." Paper delivered 
10 the Association of American Publishers, Washington, D.C.. April 
29, 1971 (mimeographed). 

2. Henry Chauneey and John E. Dobbin, T I'Stmg: In Place in Edn- 
cation To£/«v(New York; Harper& Row. 1964), pp. 102-104. 

3. Ned A. Flanders, *The Changing Base of Performance-Based 
Testing," Phi Ddui Kapixin 55, no. 5 (January 1974): 312. Empha- 
sis adde<l. 

4. Lee J. Cronbach, **Mental Test and the Creation ot Opportuni- 
ty/' Paper delivered to the American Philosophical Society^ April 
1970 (mimeographed). 

5. Edmund J* Farrell. Deciding the Future: A Forecast ofRes/yott' 
sibdities of Secondary Teachers ofBtglisb. 1970-2000 AD. Research 



J58 



164 ALFRED H.GROMMON 



Report No. 12 (Urbana. IIL; National Council of Teachers of En- 
glish, I97t)> p. 14L 

6. James Sabol, "Bellevue Public Sehooli, Speeilications for 
Evaluation Strategy/' in Ust^s, Abuses, Misttst's oj Snindardized 
Tests in Ettslish (Urbana, 111.: National Council ofTeaehers of En- 
glish. l974)out of print. 

Susan Wood. Evalttatiftn of Published English Tests, DPI Bulle- 
tin No, 144> Wiseonsin English-Language Arts Currieulum Project 
(Madison* Wis,: Wiseonsin Department of Pubtie Instruction, 
1967). This report was written under the direetion of Robert C. 
Pooley. then a professor of English at the University of Wiseonsin> 
Madison. 

8. Osear Krisen Euros, editor. The Sixth Mcnud Measun'ments 
Yearbitok (Highland Park> NJ.: The Gryphon Press, 1965); and 
The Seventh Mottal Measurewents Yearbaoky 2 vols, (Highland 
ParLN.J.; The Gryphon Press, 1972X 

9. Evaluanon of Published Eiif*lish Tesis^ pp. 85-86. 

10. ''Students' Right to their Own Language*** C^///c^L'Cfw/f//fK/no/f 
ami CommuniCiiiion 25 (Fall 1974). See this Speeial Issue for an 
analysis of the nature and implications ot'dialects> of the teaching of 
our **grammar/* of implieations for standardized tests, of the 
knowledge about language needed by teachers of English, Note also 
the bibliography of 129 entries. 

11. For further information about the advantages and disadvan- 
tages of norm-referenced and eriterion- re fere need tests, see W. 
James Popham, *'An Approaehing Peril: Cloud-Refereneed Tests," 
Phi Delta Kapim 55. no. 9 (May 1974): 6!4-615. 

12. Robert B, Carruthers, Building Hv.ner EuglLsh Tests, A Guide 
for Teacltv.i\% of English in the Secondan School (Urbana, 111.: Na- 
tional Council ofTeaehers of English, l963). Available from ERIC 
Doeument ReproduetionService> ED 038 385. 

13. '*Some Things Parents Should Know about Testing: A Series of 
Questions and Answers/* Test Department (Nevt York: Haveonrt 
Braee J^wanovleh, Ine.), p. 4. 

14. Alan C. Purves and Riehard Beaeh> Literature and the Reader. 
Research in Res/xntse to Literature, Remliug luterests, and the 
Teaching of Literature (Urbana, III; National Council ofTeaehers 
ofEngli5fi,'l972). 

ERIC 



PROBLEMS AND RECOMMENDATIONS 165 



15. Dallas, Texas, Independent School System, "Critcrion*Rclcr* 
cnced Tests," in Uses, Abuses. Misuses of SuttHhrdized Tests in 
English (Urbana, III.: National Council of Teachers oP. English, 
1974) oul of print. 

lb. For example, Ihc New York Stale Education Department pro* 
\idcs schowls with **tc\sl-loan paekels'' to help them develop their 
mn progninis of standardized tests and thereby have samples of 
tests readily available to them (ETS Testing Programs, p, 31). 

17. Dallas, Texas, Independent School System, '*Cheeklist tor the 
Administration of Standardized Tests," in Usvs. Abuses. Misuses of 
Stitmlanlizetl Tvsts in English (Urbana, III.: National Couneil of 
Teaehers of English, 1 974) out of print. 

18. Dallas, Texas, Independent Scliool System, * Some Considera- 
tions for Teachers in Using Tests and Test Results,*' in Uses. 
Abuses. Misuses of SiiUulunHzed Tests in English (Urbana, HI.: 
National Council ol Teachers of English, 1974) out of print. 



ERIC 



J60 



Afterword 

Alfred H. Gronimon 



In this report by (he NCTE Commillf lo Review Standardized 
Tests, Paris One and Two have been devoted lo establishing some 
baekground about testing programs and to identifying trends in 
statewide programs of testing and assessment* partieularly those 
trends that ma> guide English teachers and others in redueing prob- 
Icms posed b> eonventional uses of standardized tests. Part One> in 
particular. uKcrs reeommendations that ma> help teachers already 
in\olved in usuig standardized tests or in eon^idering their use. Tlie 
review of these matters and suggestions is intended to help tcaehei*s 
avoid some unneeessary diftleulties arising from their eonsidering 
tests somewhat in isolation or at least in a eontext too restrieted to 
enable them and their students to derive benefits from whatever as- 
sessment program they are involved in. On the positive side, sugges- 
tions are offered for the improvement of testing* sueh as the ereation 
of tailor made tests or the emulation of an exemplary statewide pro- 
gram of assessment. 

The major purpose of this report lies, however, in Part Two where 
readers are offered reviews of many published standardized tests, 
some of them in wide use nationally. Those reviews foeus on the 
validit) of sueh tests. The reviewers were eoneerned with the rela- 
tionship of the content of eaeh test to what Is now known about the 
subjeet matter of English and about the treatment of partieular as- 
peets in English elasses. They were eoneerned also about the rela* 
tlonship of that version of subject matter to the diverse eultures and 
learning st>los represented by the ^^ide variety of pupils, They were 
nut concerned, however, with sueh other important features ofstan* 
dardized tests as reliabtlit) of items, nature of the sampling of test- 
takers, standardization of norms, treatment and interpretation of 
seores. 

The ad hoe Committee was charged by the NCTE Committee on 
Research to devote Its attention to eommerciall) prepared and pub- 

m 



ERIC 



AFTERWORD 167 



lishcd^tandardued tests designed to evaluate students' command of 
the English language, grammar, usage, diction, spelling, the etTec- 
tiveness of their written expression in English, and their knowledge 
of and responses to literature. Reading tests were not included. 

As the distribution of reviews in Part Two indicates, most of the 
available standardized tests are intended to measure students' com- 
mand of aspects of English grammar, usage, punctuation, capital- 
ization, and spelling. Fewer tests are available to assess the ability to 
write English effeetively and to respond to literature sensibly. Un- 
doubtedl>t English teachers already experieneed in ereating their 
own criterion-referenced or content-referenced tests fill in such gaps 
in standardized tests with their ow n means of evaluating their stu- 
dents' writing ability and response to literature. The Committee 
hopes some of the preceding suggestions and materials and the re- 
views of published tests may encourage other teachers of English 
also to venture into test-making. 

Although the Committee made an extensive seareh for tests re- 
lated to its assignment, it realizes that it must have inadvertently 
overlooked some that may be more valuable and even more widely 
used than those it was able to obtain from publishers. Some English 
tests, however, were intentionally omitted, particularly those used as 
a part of the National Assessment of Edueation Program and those 
used by the College Entrance Examination Board because those 
tests are not generally available to English teachers for their own 
purposes. Then top, given the time between the inception of the 
Committee and the publication of its report, it was inevitable that 
some pnblished tests would go .out of print or be so extensively re- 
vised that their original review was no longer applicable. 

As an outgrowth of the programs of testing and assessment 
throughout the country, tests are undergoing revisioru new ones are 
being created, and reports in the ETS surveys indicate that still 
others arc being planned. Almost every state reported that the prob- 
lem with top priority is the designing of new te^s that are better suit- 
ed to measure validly its educational goals, the edueational pro^ 
grams of its local schools, and the diversity of its students. 

Consequently, this present report is only an initial effort in tire 
plans of the Committee on Research to prov ide the profession of En- 
glish teaching with continuing reviews of standardized tests in En> 
glish. Thus, periodic reviews of revised and new standardized tests iji 
English* and of some that may have been overlooked in this report, 
are expected to be published by NCTE for the benefit of English 
teachers and their students everywhere. Furthermore* the NCTE has 



ERIC 



362 



168 ALFRED H.GROMMON 



appointed both a Tabk Force on Mcaburcmcnt and Evahiatton in 
English and a Committee on the National Assessment Edueattonal 
Program, in the last twelve months, NCTE has published the report 
of the Task Force as Common Stmv ami TcMtu English and the 
report of the Committee ab Satimjal Aw^^^ment ami the Teachwg 
o/Eti^^lish. 

The contributors to this lepurt sineerel> hope their elYorts are not 
onl> of some benefit to teachers of Engttbh, now and in the near fu- 
ture, but also offer a short cut to establishing a continuing review of 
standardized tests in Engli:vh that are prepared b> publishers to be. 
sold to teaehers of English. 



163 



Index of Tests Reviewed 



Buckingham Extension of thcAyres Spelling Scale, 59 
California Achievement Tests (CAT) 

Elementary language tests, 61 

Junior and senior high language tests, 81 
California Short-Form Tests of Mental Maturity* 83 
Clerteal Skills Series, 84 
Cognitive Abilities Test, 73 
College English Placement Test (CEPT), 1 22 
College Qualification Tests* 85 
Comprehensive Tests of Basic Skills (CTBS) 

Elementary language tests, 63 

Junior and senior high language tests, 85 
Concept Mastery Test, 87 
Content Evaluation Series Language Arts Tests 

Composition Test, 122 

Language Ability Test, :03 

Literature Test, 133 
Cooperative Academic Ability Test (AATX 88 
Cooperative English Tests, English Expression* 89 
Cooperative Literature Tests, 134 
Cooperative Primary Tests, 7 1 
Cooperative School and College Ability Tests (SCAT), 90 
Cooperative Sequential Tests of Educational Progress (STEP) 

Listening. 109 

Writing. 120, 125 
Differential Aptitude Tests (DAT): Spelling, Language 

Usage and Verbal Reasoning, 91 
Essentials of English Tests, 92 

Evaluation and Adjustment Series, BrownXarlson Listen* 

ing Comprehension Test, 80 
Fundamental Achievement Series (FAS), Verbal, 94 
High Sehool Plaeement Test, 94 

HollingswortbSanders Junior High Sehool Literature Test, 128 
HoskinS'Sanders Literature Tests, 129 
Hoyum.Sanders Junior High Sehool English Test, 96 
Hoyum^Sanders English Test, 59 

169 



ERIC 



16i 



170 



Illinois Tests in the Teaching ofEnglish 
Knowledge ofLanguage* 102 
Knowledge of Literature, 130 
Iowa High Sehool Content Examination for High Sehool 

Seniors and College Freshmen. 130 
Iowa Placement Examinations* English Training* 96 
lowaSpclltngSeales. S9 
Iowa Tests of Basie Skills, 97 
Io\*a Tests of Edaeational Development(ITED), SRA 

Assessment Survey* 99 
Kansas Junior High Sehool Spelling Test, 101 
Kansas Elementaty and Intermediate Spelling Tests* 60 
Literature Tests* Objective and Essay* 131 
MeGraw-Hill Basie Skills System 

Spelling and Vocabulaty Tests* 104 
Writing Test* 120* 123 
Metropolitan Aehievement Tests (MAT)» 6S 
Minnesota High Sehool Aehievement Examinations, 

Language Arts, 106^ 
Missouri College English Testt 107 
The NCTE Cooperafive Test of Critieal Reading and 

Appreciation* A Look at Literature^ 132 
New Iowa Spelling Seale* S9 

Responding: Ginn Interrelated Sequenees in Literature* 

Evaluation Sequenec* 136 
Sanders-FleteherSpellingTest* 108 
Sanders-Fleteher Vocabulary Test, 108 
Seienee Research Associates Assessment Survey* 

Aehievement Series* 67 
Stanford Aehievement Test (SAT) 

Primaty Level II, Intermediate Level U o jd Advanced 
Levels 110 

High Sehool Arts and Humanities Test* 128 
Tests of Academic Progress 

Composition, 113* 120 

Literature* 13S 
Tests of Adutt Basic Education, IM 
Tests of Basic Experiences (TOBE)* 72 
Thurstone Test of Mental Alertness* IlS 
Walton-Sanders English Test* 1 1 S 

Wide Range Achievement Test (WRAT), Revised Edition* 116 



ERIC 



J65 



